jrun connection closed errors

193 views
Skip to first unread message

Joe Copley

unread,
Nov 16, 2012, 5:27:33 PM11/16/12
to fusion...@googlegroups.com
My win2003/CF7 production server is plagued by intermittent jrun closed connection errors. When these occur, they occur for every CF request and persist until the server is rebooted or the jrun process is terminated in Task Manager. (Attempting to restart CF from the services list never works.)

I installed FR and implemented crash protection against long request times and low memory, in the hope that it could help diagnose this problem.

I do get occasional CP notices when the server is running, so I know that it is working. But there were no triggers sent before the crash that indicated any problems. According to my IIS logs, the last successful .cfm return status was as 2:43AM on 11/15, and then there were no logged requests until 3:06AM, and from that point on they were logged with status 503, which I assume is when the jrun closed connection errors started appearing to users. The condition persisted until 7:30 AM when the jrun process was terminated.

I checked the FR log archive and found that there were no log files dated between 2AM and 7AM, which means that whatever made the server hang up prevented FR logging the 47 minutes leading up to the problem. The FRAM heartbeat log showed it running serenely through the whole episode.

The last CP notice before the hang was triggered at 1:52AM that morning, a low memory alert for a routine maintenance script. This was nearly an hour before the server hung up, so I doubt it is related. The coldfusion-out.log appears to have a thread dump corresponding to the 1:52 CP alert, then nothing until the server rebooted at 7:30.

This is frustrating because the very reason I installed FR was to help me fix this problem. But FR will not run when this condition exists. FRAM will run, but doesn't give me useful information so far as I can find. Is there anything FR can do to help with this problem that I am missing?

Thanks in advance,

Joe Copley

Charlie Arehart

unread,
Nov 16, 2012, 11:49:18 PM11/16/12
to fusion...@googlegroups.com

Joe, sorry to hear of your challenge. And while FR can and should be able to help, it can’t in all cases. Still, I do think there’s more you can leverage than you may realize.

First, though, let’s clarify that the FRAM heartbeat log is of no value in this case. That’s watching the FRAM instance, not your CF instance. You really have almost no need to watch the FRAM logs. Watch instead the CF instance’s logs ([fr]\instance\[instancename]\log). Of special value can be the resource-0.log (or other numbers as it may fill and rolls over). See the columns after the REQ column which give insight into the number of requests running and completing at each 5-second interval. If ever that requests running count hits your limits for “max simultaneous requests” in the CF Admin “Request Tuning” page, no more requests can run which can make it seem that CF is hung. See the online help (in the FR interface) for more details on that log. There are also additional troubleshooting resources on the FR support site to help with interpreting the info FR can provide.

Second, you don’t need to rely on the logs, as the FR CP alerts can notify you as well. You mention that you had set up the long running and low memory alerts, and that’s great that you’re leveraging the alerts. Many miss them (and the docs above, so are often not fully leveraging some of the best capabilities of FR to help with such troubleshooting.)

Still, I’d argue that the generally far more valuable alert is the one you did not setup (or at least did not mention): running requests. Let’s say your Max Simultaneous Requests number was set to 20. You ought to set this running requests alert to 18, so that if ever 18 requests are running at once, you get an alert.

If you get that alert, and the requests are taking a long time, this will be useful to diagnose why they were running long, because again if it hits 20 (if that’s your limit in the CF Admin), then no other requests can run until there are free slots under that 20 limit.) The alert would include a thread dump, including details for each running request, which will often help identify and resolve what’s making requests hang up.  Again, there’s info in the docs and support resources for interpreting stack traces.

And if you get lots of running requests alerts at 18, but they always show the requests being only seconds old or faster, then that’s a sign that the limit of 20 isn’t big enough on your server, so you ought to increase it. Despite the old rule of thumb about it being 2-3 times the number of CPUs, I find that has no connection to reality, and some people can run 50 or 100 requests at once. The key is: what are they doing. If most requests are running CFML that’s sitting around waiting for some resource (like a database call, or a CFHTTP request, or calling a web service), then having more running at once takes no resources. Again, the alert can clue you into what the requests are doing when many are running at once. (Again, too, if they are NOT running long when you get the alert, just ignore it until you get one that DOES show requests running a long time.)

Finally, you mention FR stopping writing its logs, and you got an memory alert. First, what is your memory alert value set it? Hopefully it’s something like 20 (meaning alert when 20% is free) not 80 (which some do by mistake, thinking it’s when 80% is used.) Second, you refer to the CF out log. Good for you for knowing about that. You say you see it showing an FR thread dump. More important, do you see any outofmemory error in that log. Search the file for that string (outofmemory). (If you may find it hard to sift through the logs, because they are filling because of the dumps that FR puts there on each CP alert or stack trace done in the interface, note that it no longer does that as of FR 4, so you may want to upgrade.)

It’s vital to know if an OOM condition is happening, and if so, what kind of outofmemory error it is.

If the OOM error says “heap space” or “gc overhead limit reached”, that would mean that your heap is indeed filling. As tempting as it may be to increase your heap, there may be some other explanation for why it’s filling. Perhaps it’s high rate of session creation, or high use of the query cache, etc. If you have the FR Extension for CF (FREC, free for use with FR 4 and above, another reason to upgrade for those still on FR 3), it creates yet another FR log, the realtimestats.log, which tracks such things as the count of sessions, the count of query cache, which can be valuable. Even if the logs stop being written, you may see some upward trend before that. Not even the CF Enterprise Server Monitor (in 8 and above) logs this sort of info, so it’s a powerful weapon in the FR arsenal.

(Also, if the heap is filling, you may not even be able to increase its size, if you’re still on a 32-bit OS, where you can’t raise it much about 1.3gb.)

If the OOM error says instead “unable to create new native thread”, or “out of swap space”, that may well mean that you should lower the heap, as each of those can mean that there’s pressure either within the address space (in the first case) or within the OS (in the second case, which may happen if other apps are running on the box and competing with CF for overall system memory.)

So no, in some cases, problems cannot be “solved” by FR, nor can some even be “detected” or “protected against” by FR (such as some of these OOM errors.) But I hope that the info above may help you as you try to explore the problem.

And if you might be challenged to further interpret the info identified or might be pressed for time and need more immediate and direct assistance, note that there is available consulting support from Intergral, the makers of FR, at cfconsultant.com. (I’m one of the consultants who may help you if you acquire that service. I’m not an FR employee but an independent consultant who also helps people solve CF server problems, and help people work with FR to help with that nearly every day.)

That said, we here on the list (the Intergral folks and others) enjoy helping each other out, so if you have more questions, fire away.

/charlie

--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To view this discussion on the web visit https://groups.google.com/d/msg/fusionreactor/-/DFPnBBdIZKUJ.
To post to this group, send email to fusion...@googlegroups.com.
To unsubscribe from this group, send email to fusionreacto...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fusionreactor?hl=en.

Joe Copley

unread,
Nov 17, 2012, 4:58:23 PM11/17/12
to fusion...@googlegroups.com
Charlie:

Thanks for the response, it is very helpful.
  • I do have the max request trigger set at 10, but I realized that the Max Requests were set to 8, so it never fired. I will take your advice and try a higher limit with appropriate CP trigger.
  • My memory alert value is set at 15%.
  • I do find a a few java.lang.OutOfMemoryError entries in the coldfusion-err.log, and the -event log, but the nearest ones to the hang occurred on 11/5 and 11/16.  Also, these memory errors are followed by a stack trace starting with java.lang.IllegalStateException
  • my heap size is 512MB
  • I am running FR 4.5.2. I have installed the CF extensions, I think, but it's not clear what effect they are having. For instance, I cannot locate realtimestats.log

Joe



Charlie Arehart

unread,
Nov 18, 2012, 12:31:31 AM11/18/12
to fusion...@googlegroups.com

Great to hear that it was helpful.

That said, it’s generally critical that you clarify exactly WHAT follows the outofmemory errors (on the same line, whether “heap space”, gc overhead limit reached”, “out of swap space”, etc.) If you see nothing on the same line, keep searching the log around the time of the error for another line which should identify that. (Then again, you are on CF 7, which is Java 1.4, and I seem to recall noticing that sometimes that version of the JVM did not offer details in some cases.)

It may be that on such an old VM, you may have to guess at what the problem could be. Again, the realtimestats.log (from FREC) could help perhaps spot some common problems (high sessions, high query caching), or their lack of them could point to a different problem. You could also try raising the heap, but since you don’t know for sure if you are hitting that limit, it could be that raising it could cause other problems. How much total memory do you have on the box? And under a steady state, how much memory is CF using, and more important, how much does the OR report is unused?

As for being unable to locate realtimestats.log, it’s important to note if you added that to the FR instance within CF. Do you recall being asked that? And did you restart CF since doing that? (It does not do it on its own, like the main installer of FR does.) You can look in the Active Plugins page of the FR interface (in 4+) and see if it lists the FR CF plugin (among the few that will be shown.)

 

/charlie

--

You received this message because you are subscribed to the Google Groups "FusionReactor" group.

To view this discussion on the web visit https://groups.google.com/d/msg/fusionreactor/-/0Kz1jT3qncgJ.

Joe Copley

unread,
Nov 19, 2012, 5:13:09 PM11/19/12
to fusion...@googlegroups.com
Hi Charlie:

I do believe I am running JVM 1.4. Here is a sample from my -err.log:

11/16 15:28:48 error
java.lang.OutOfMemoryError

java.lang.IllegalStateException
    at jrun.servlet.JRunResponse.getWriter(JRunResponse.java:193)
    at jrun.servlet.JRunResponse.sendError(JRunResponse.java:564)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:299)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:541)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:204)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:318)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:426)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:264)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
11/16 16:53:18 error Cannot create cookie: expires = Sun, 09-Nov-2042 21:53:17 GMT
11/16 17:59:14 error Cannot create cookie: expires = Sun, 09-Nov-2042 22:58:48 GMT
11/16 17:59:14 error Cannot create cookie: path = /
11/16 17:59:14 error Cannot create cookie: expires = Sun, 09-Nov-2042 22:58:48 GMT
11/16 17:59:14 error Cannot create cookie: path = /

If I read this correctly, it is not telling me the specific source of the memory error.

Also, you said,

" it’s important to note if you added that to the FR instance within CF."

I'm not sure what this means. This appears on my plugins page in FR:


FusionReactor ColdFusion Plugin
[fr-coldfusion-plugin-1.0.1.jar]
Intergral Information Solutions GmbH
This plugin provides access to the ColdFusion log files and Server Monitoring API

[fr-coldfusion-plugin-1.0.1.jar]

There is a confg button and a Logfile button for the plug-in. The login button shows no logs, and the config page only has a single drop-down which is set to enabled. I have started CF many times since this plug-in was installed.

By the way, my server hasn't crashed again since I last wrote, but I did set the max requests to 20 and the CP trigger to 18. I did get a CP notice this morning, and my server logs show several slow requests at about that time, but the server did not crash. The stack trace in the email showed several jrpp threads in runnable state, so I assume that means the server was not queuing requests.

Many threads also looked like this:

-------------------------------------------------------------------------------

Thread ID:    jrpp-381
Priority:     5
Hashcode:     28140770

"jrpp-381" prio=5 tid=0x084ce010 nid=0x1654 waiting for monitor entry [0x3e11f000..0x3e11fd94]
    at coldfusion.server.j2ee.pool.ObjectPool.updateLocked(ObjectPool.java:189)
    - waiting to lock <0x13ee71f8> (a coldfusion.server.j2ee.sql.pool.JDBCPool)
    at coldfusion.server.j2ee.sql.JRunConnection.touch(JRunConnection.java:256)
    at coldfusion.server.j2ee.sql.JRunConnection.createStatement(JRunConnection.java:327)
    at coldfusion.server.j2ee.sql.JRunConnectionHandle.createStatement(JRunConnectionHandle.java:67)
    at coldfusion.sql.Executive.executeQuery(Executive.java:720)
    at coldfusion.sql.Executive.executeQuery(Executive.java:685)
    at coldfusion.sql.Executive.executeQuery(Executive.java:646)
    at coldfusion.sql.SqlImpl.execute(SqlImpl.java:236)
    at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:500)
    at cfgetglobaloptions2ecfm1288602880.runPage(E:\inetpub\clientsites\ca\cf\getglobaloptions.cfm:2)
    at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:152)
    at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:349)
    at coldfusion.runtime.CfJspPage._emptyTag(CfJspPage.java:1915)
    at cffiberglass2ecfm1521387890.runPage(E:\inetpub\clientsites\ca\cf\fiberglass.cfm:17)
    at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:152)
    at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:349)
    at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
    at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:225)
    at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:51)
    at coldfusion.filter.PathFilter.invoke(PathFilter.java:86)
    at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:69)
    at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:52)
    at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
    at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
    at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
    at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
    at coldfusion.filter.RequestThrottleFilter.invoke(RequestThrottleFilter.java:115)
    at coldfusion.CfmServlet.service(CfmServlet.java:107)
    at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:78)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
    at com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doHttpServletRequest(FusionReactorCoreFilter.java:503)
    at com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFusionRequest(FusionReactorCoreFilter.java:337)
    at com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFilter(FusionReactorCoreFilter.java:246)
    at com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionReactorFilter.java:121)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
    at jrun.servlet.FilterChain.service(FilterChain.java:101)
    at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:91)
    at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:257)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:541)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:204)
    at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:318)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:426)
    at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:264)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

-------------------------------------------------------------------------------

I take this to mean that it is waiting on the database. If I am seeing a lot of these in a thread dump, is it reasonable to assume that the database server may be a potential bottleneck? And if all available threads went into a wait state for the database, might that cause a jrun closed connection error as I have described?

Regards,

Joe

Charlie Arehart

unread,
Nov 20, 2012, 12:04:55 PM11/20/12
to fusion...@googlegroups.com

Joe, a few things.

First, as for your FREC logs, maybe there is some limit of how well they work with JVM 1.4. I’ll have to leave the FR folks to reply to that.

Second, rather than focus on “all the threads” (as it seems you may be doing as you look for problems in your alerts), I’ll suggest something that may help you or other readers following this thread.

You can make the job of analysis a lot easier if you focus only on the thread ids that correspond to those listed at the top of the FR CP Alert, where it shows what requests were running, and for how long. Focus on those running longest, as they are listed in descending duration order.

Get the requestid (the left-most number in the lines showing each running requests at the top) for a long-running request (running at least for a few seconds according to the report), then do a find (within the alert) for that number, which will generally take you to the bottom of the alert, where you can now see those same requests, but now as a set of about a dozen lines with details for each request.

One line is the requestid. Note the “execution time”, and again as long as it’s more than a few seconds, note the threadId (another line in that report), then search the alert for that threadId, which will take you back up into the middle of the alert where the stack traces for each thread are listed (there will be hundreds, but we’re only interested in those few that are associated with long-running CF Requests.) Then focus on what the stack trace (like you showed below) says for at least those long-running requests.

And yes, you might find them or many in that state, and that may suggest a problem with the DB, but really you may sometimes find that the longest-running is a real culprit and the rest are hurt because of it (or not). I’ll also note that your stack trace actually shows it waiting for locking within the connection pool. That could indicate that it’s not “waiting for the database” (yet) but instead “waiting for a connection from the connection pool” (though not necessarily), which might suggest all the more that there is one (longer-running than it) that started things.

I’ll note as well that there can be another thread doing db i/o that would NOT be listed at the top of the FR CP alert: that’s a scheduler-nn thread (where nn is some number). Scheduler threads are NOT related at all to CF scheduled tasks (as some would reasonably think) but instead are background tasks that CF runs to do various things. One of those is to run the client storage purge, which many people set the CF Admin to use a database for, and in that case it could be that there’s a long-running scheduler-nn thread that could be perhaps causing some contention (for the DB or for the connection pool, less likely). Just throwing that out for you and others, as you try to read CP alerts and solve problems.

Indeed, along those lines, note that sometimes you may even want to ignore a CP alert. Let’s say you told FR to track the “running requests” alert, when a certain number of requests are running (as in your case 18). Just because you get the alert doesn’t mean there’s any problem at all: you may get an alert showing several or even dozens of requests “running”, but if you look closely at that list of them at the top, you may see that they are all running less than a second or a few seconds. In that case, I’d argue there may be no real value in looking further into what they are doing. Just let it go and wait until you do get an alert showing some or many long-running requests, and then look into them.

Finally, I’ll note that you have shown in your error log below some cookie-related errors, which on the surface may seem innocuous, but in fact they sometimes reflect the fact that a bunch of automated requests from search engine spiders and bots (or other automated tools, etc) may have made a flush of requests. It could be that your problem is really caused by that. We see it all the time. In that case, look at the FR logs resource-nn.log, where you can see a column that tracks every 5 seconds how many requests are running and have run in that past 5 seconds. That can be great for spotting a sudden dramatic rise in requests, which may be the root cause. Then you can look at the FR request-nn.log to see the actual URLs, IP addresses, and more for each request run at that given time.

These are all the sort of things that I and the Intergral folks use daily in helping people solve CF server problems. There’s nearly always an explanation for what’s amiss, and it’s often not at all what someone expected (or what many in the community propose are “common problems”.) With the right tools, you can know what’s amiss, based on diagnostics pointing you to the problems, rather than guessing and trying things to see what may work (as many seem to recommend). All that said, while FR can’t always “protect you” from problems, it can certainly significantly aid you, whether with its interface, its logs, its alerts, etc.

Hope that is helpful.

 

/charlie

 

From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Joe Copley
Sent: Monday, November 19, 2012 5:13 PM
To: fusion...@googlegroups.com
Subject: Re: [fusionreactor] jrun connection closed errors

 

Hi Charlie:

Joe Copley

unread,
Nov 23, 2012, 4:06:31 PM11/23/12
to fusion...@googlegroups.com
Hi Charlie:

Again, very helpful, thanks. I am learning a a lot about how ColdFusion works, but still haven't quite figured out the root cause of the jrun closed connection error.

It happened again a few hours ago. I was there to catch it this time and minimize the downtime, but there is nothing to do for it, except kill the jrun process. I did get a CP memory protection alert some minutes prior., but it showed only one running CF request, albeit a very db-heavy one. It generates about 10,000 emails to a subscriber list and logs each one in a database. CF did not log this as a slow request, so I am assuming it was still running when JVM went down.

There was a JVM out of memory error in the -err log, but it isn't time stamped; I assume it is related to the jrun closed connection error, as it was the last entry before it went down. It is followed by another entry pointing to the long-running template and may be related to jdbc. See below.

It would be great if FRAM could generate a JVM thread dump. I have read that you can send a signal to the process to make it spit one out, even if it isn't accepting connections. I finally figured out how to do this from the console, so I guess I will restart CF that way and leave it until the problem occurs again so I can generate the dumps.

Meanwhile, my working theory is that excessive db activity is swamping the db connection pool and causing it to run out of memory. I will be focusing on making database connections more efficient and coping with bots and agents that cause occasional havoc. FR has helped me identify some of these issues. Still it would be nice to know exactly what is disabling the server. Thanks again for all your help.

Regards,

Joe

jrun coldfusion-err.log, 9:41am:
=======================================
MASKED EXCEPTION END -----------------------------------------------------
java.lang.OutOfMemoryError
java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:324)
    at com.intergral.fusionreactor.osgi.common.FusionReactorMessager.message(FusionReactorMessager.java:39)
    at com.intergral.fusionreactor.jdbc.Wrapper.handle(Wrapper.java:469)
    at com.intergral.fusionreactor.jdbc.MeasurableStatement.metricSignalCanBeHandled(MeasurableStatement.java:180)
    at com.intergral.fusionreactor.jdbc.MeasurableStatement.metricSignalCanBeHandled(MeasurableStatement.java:170)
    at com.intergral.fusionreactor.jdbc.StatementSurrogate.close(StatementSurrogate.java:230)
    at coldfusion.server.j2ee.sql.JRunStatement.close(JRunStatement.java:159)
    at coldfusion.sql.Executive.executeQuery(Executive.java:796)

    at coldfusion.sql.Executive.executeQuery(Executive.java:685)
    at coldfusion.sql.Executive.executeQuery(Executive.java:646)
    at coldfusion.sql.SqlImpl.execute(SqlImpl.java:236)
    at coldfusion.tagext.sql.QueryTag.doEndTag(QueryTag.java:500)
    at cfsendlisting2ecfm1325520080._factor13(E:\inetpub\clientsites\oldhouses\maintain\sendlisting.cfm:1153)
    at cfsendlisting2ecfm1325520080._factor14(E:\inetpub\clientsites\oldhouses\maintain\sendlisting.cfm:1060)
===================================================================

 

Charlie Arehart

unread,
Nov 23, 2012, 5:02:27 PM11/23/12
to fusion...@googlegroups.com

Were you able to see what the request was doing that was running in the alert (according to the stack trace for the request, shown within the thread dump of the alert)?

Also, you say you wish FRAM could trigger a thread dump. But FR did, within the JVM itself, in the FR CP alert.  I suppose there could be a time when a thread dump could be useful other than when an alert condition arises. But I’m pretty sure that an external process (FRAM) could not trigger it unless you had configured the instance to be monitored (CF in your case), such as with RMI. Though as you indicate, you can also do it from the command line.

That said, I would think that if you can get it to respond to that thread dump request, you may well be able to get into the FR interface (in the instance, not FRAM) and get it to take a thread dump (under the Resources>All Threads) page.

You mention finding OOM errors in the err.log, but without a date/time stamp. What do you see in the out-log? Again, I realize that in CF7 things may differ from later releases, but usually the out log DOES have a timestamp for the error. Not also  that sometimes you see 2 OOM messages for each such error, one at the top of a bunch of messages, and one at the bottom, and (in the out log I’m saying) there is usually a time stamp on one of them.

Also, I don’t think the “connection closed” is at all significant, at least from my experience. The simple question (if you’re saying that CF is becoming unresponsive) is what’s causing that. It isn’t unusual for it to be the JVM going nuts as it’s trying to shutdown—and again there then the real question is why that is happening. The thread dump may help, if in fact that “one running request” is in fact the criminal here. But in my experience it’s rare that any 1 request is the real root cause for a problem.

Instead, there is nearly always some other explanation, generally from some (or many or thousands or millions) of requests which have contributed to resource consumption over time(very often spiders and bots).

Again, a tool like FR can’t say “here’s the problem and the solution”, but it can provide the diagnostics needed to help. I’ll stress  again the value of the FR resource log (in the FR/CF instance logs) to help know more about what sort of request activity was happening around (and before) the time of the problem. It also tracks the JVM heap use percent.

There are also various logs that track different uses of other parts of JVM memory (in the FR logs), but I do think they vary based on the JVM and I seem to recall seeing that they have more on JVM 1.5, and CF7 comes with JVM 1.4 (CF8/9/10 come with 1.6).

Finally, you don’t need to slog through this on your own. There are consulting services available from the folks behind FusionReactor (at cfconsultant.com, and mentioned also on the fusion-reactor.com pages.)

/charlie

 

From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Joe Copley
Sent: Friday, November 23, 2012 4:07 PM
To: fusion...@googlegroups.com
Subject: Re: [fusionreactor] jrun connection closed errors

 

Hi Charlie:

Reply all
Reply to author
Forward
0 new messages