- with the pattern of spiking in CPU, I would agree with your theory that it may be a
memory management issue, but this isn't something that will be obvious always in FR.
Instead, look in the [cf]\runtime\logs (or [jrun]\logs, if on Multiserver). These are
logs that many never even realize exist, and generally are far more valuable than the
traditional CF logs. Search them for the phrase "outofmemory". Once you find the files
with that, look for whichever instance of that error is the first when it starts to
report many of them within a few minutes. That's key, because one kind can lead to
another. You want to know what the first it. It could be "outofmemory: heap", or
"outofmemory: out of swap space", or "outofmemory: unable to create new native
thread", and so on. Each is VERY different, and some in fact mean that you SHOULD NOT
INCREASE the heap size to solve the problem but rather may want to decrease it.
- in fact, in that you say that you run the DB server on the same machine, I would not
be at all surprised if you find that you're getting "out of swap space" (as the first
error when things go south). This happens because SQL Server could be stealing memory
that prevents CF getting what it needs/wants, resulting in that error. With you having
SO MANY queries, I would not be surprised it's happening at all. In fact, unbeknownst
to most, SQL Server is configured by default to use virtually unlimited amounts of
memory. You will likely want to limit it to some reasonable amount. Given you have 4gb
on the box, and about 1gb used by CF, perhaps limiting it to 2GB (or more if you want
to let it use virtual memory) would be wise. (This is of course one of the reasons to
move SQL Server to its own box, not so much "throwing hardware at the problem", but
just giving each cranky child its own playground. Indeed, note as well that in your
graphs, the FR CPU graph DOES in fact show that "system cpu" is quite a bit higher
than "CF CPU" when things go bad, which could well be your SQL Server, and which is of
course another reason to move SQL Server to another box.)
- as for the observation of heap use growing a bit high after this issue, note that
you may find that if you force a GC (in the FR System Metrics page's memory graph)
that it may drop, indicating merely that the JVM had not gotten around to doing a
Major GC on its own, as it was so busy. If it does drop, then the rise in memory is
nothing to worry about. If it doesn't, then something's still holding the memory
(could be any of many things), and over time that could well lead to you running out
of available heap (when it might later try to do a GC and cannot because too much is
still "really" in use.)
- I do agree that with your current design, you're possibly burdening the template
cache. Increasing it probably would be a good idea. It doesn't use much more memory as
you increase it, but admittedly we can only guess about template cache issues in CF
Standard. If you had Enterprise, the Server Monitor does provide useful information on
the template cache which may guide you. I understand you're planning a rewrite, but if
nothing else above has helped, it's worth a try just temporarily increasing the
template cache size until then.
- with you having 800 queries per second, I would propose that for those being tracked
by FR (where the DSN is wrapped), you may want to disable the "query location" option
(in the JDBC settings page for FR)? With it on (the default), FR is having to stack
trace each request at each query to get that info, which in such very high volume of
queries per second can be a burden. I see that you said that things don't seem to
matter whether you do or don't used a wrap DSN. Still, you may have many different
issues, and solving one may simply cause another to become more clear. I'd simply
propose that (especially if you never really use the query location info) that you
disable it just as a precaution.
Hope those are enough to get you started. Obviously, some problems are not solved by
information provided in FusionReactor. The runtime (or jrun) logs are often vital.
I will add that if you want still more help in resolving things, this is where the
Consulting Services offered by the FusionReactor team (and their partners, including
myself) can be helpful. For more info, see http://www.cfconsultant.com. But to be
clear, you will find that this list is also a great resource, at least as long as the
discussion is well-suited to email. :-) And obviously, others benefit, either now or
if they find the discussion in the future.
Let us know how things go with the above.
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 11:21 AM
> To: FusionReactor
> Subject: [fusionreactor] Help Interpreting FusionReactor Stats
>
> We have been having an issue when under peak load over the last couple
> weeks. Upon installing Fusion Reactor we were able to optimize some
> trouble spots (long running queries, large pages, etc) and while this
<snip>
Well, one thing I noticed (in the screenshot of the alert email which you offered at
http://www.vsxp.com/fr_cpEmail.html, for those who want to follow along), is that if
you look carefully at the list of requests at the top of the email, you will see that
other than the first request, all the others are within 2 seconds of the time of the
alert. So it's not like they were waiting a long time. Indeed, if you then look at the
list of requests that is repeated at the bottom of the email, with more details,
presented in descending order of execution time, you can see that all but about 11 of
them were under 1 second at that moment the alert fired.
So I'm not saying you don't have a pool problem, just that all those requests "running
at once" at the time of the alert wasn't so clearly problematic. If they took still
longer, then perhaps it would be an issue. This is where FR's request log can help
you, as you can see details of each request which can confirm how long they took. You
can also use the FR resource log to tell you if the avg request time (for all
completed requests) was going up or down in this and around this period.
Just some more thoughts, and more ways that FR can help when faced with this sort of
challenge.
/charlie
PS Though you may not have a pool problem, per se, and you muse about using a new
driver, I would propose as well that you could have another burdensome draw on
resources that you don't realize and may not even want. Check your Client Variables
page in the CF Admin. Do you have any databases listed there as default client
variable storage options (one may even be set to be the default)? If so, if you click
the link for the datasource and view its details, is "disable global variable updates"
left unchecked (its default)? If so, then even if your code "NEVER uses client
variables", but your application.cfm or cfc does enable ClientManagement, then every
request is updating the client variable store. And worse, for those spiders and bots,
it's creating NEW records. Then the purge mechanism comes along every 67 minutes by
default (as set on the client variables page) and does a purge of those client
variable stores (including the registry), which could be its own source of problems.
These are things that have been discussed here before. Again, if you don't want to dig
through the emails, this is again where some consulting help can quickly identify and
explain things.
> -----Original Message-----
> From: charlie arehart [mailto:charli...@carehart.org]
> Sent: Tuesday, October 12, 2010 11:54 AM
> To: 'fusion...@googlegroups.com'
> Subject: RE: [fusionreactor] Help Interpreting FusionReactor Stats
>
> Glen, it's hard to really answer with only the info provided, but I have a few
> immediate thoughts (and I'm totally with you, that often the solution is NOT
> necessarily to throw more hardware at the problem, but simply to make better use
> of what you have):
>
>
<snip>
As for client vars, if you changed them from using a DB to using cookies, that will
have stopped them being added/read/updated from then on, but it does NOT stop CF from
clearing out the client variables wherever they used to live (though if you remove the
DBs from the list of DSNs in the client vars page), then it will at least stop looking
in them. But it will still look at the registry. What was your default before?
Also, note that this page sets only the default: if your code anywhere uses
ClientStorage (on CFApplication or as a this. Variable in application.cfc), then that
code would override this default.
As for the CP alert, do note that you are looking at only 1 request that's running
long. My point was that the other 74 were not, and were less than 2 seconds "long"
(and most less than 1 second). Just saying, don't miss the forest for the tree.
But to that one, if it were to hang "forever", then there is a fix that FR adds (by
way of an enableable option in an FR config file) that would watch for request hanging
in the jrpp buffered output state, as that one is. But if it's only hanging for a
brief time (and only one or a few), then it may not be worth worrying about. I or
someone can provide more info on that if needed.
As for my limiter function, I've not sought to test or optimize it. For me, the value
it adds is so far above any cost it may have. But to be clear, I've not noticed it add
any measurable impact at all on my own server. Then again, I'm not getting hundreds of
requests per second. I'm sure there may be ways to optimize it, and since it's called
so often, even modest gains could be valuable. I'd recommend you test it first in your
environment if that's a concern, before spending time trying to optimize it, though.
:-)
Gotta run now.
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
It could be that at these moments, SQL Server is the one having fits. You really want
to look at what's going on in there, as well. You could be having blocking problems
(read queries can't run because something writing a lot and locking the table, or
vice-versa), or perhaps you have SQL Server doing something odd like doing a backup or
a index rebuild. I realize those may seem like things that can't happen frequently.
They're just some guesses.
You should look at least at the Windows Event viewer on the server to see if it show
anything odd, then look in any SQL Server logs, then finally use a tool to monitor SQL
Server (whether those built-in, or easily added for free, or separately purchased.) I
wrote about one set of free SQL Server monitoring features that MS released a couple
of years ago:
http://www.carehart.org/blog/client/index.cfm/2007/4/10/sql_server_2005_performance_da
shboard
And I list other SQL Server monitoring tools here:
http://www.carehart.org/cf411/#dbmon
Hope that helps.
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 2:30 PM
> To: FusionReactor
> Subject: [fusionreactor] Re: Help Interpreting FusionReactor Stats
>
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 5:33 PM
> To: FusionReactor
> Subject: [fusionreactor] Re: Help Interpreting FusionReactor Stats
>
Any ideas?
Thanks!
11/09 13:15:08 metrics Web threads (busy/total): 4/37 Sessions: 2025
Total Memory=508416 Free=113756
MASKED EXCEPTION START
---------------------------------------------------
java.lang.IllegalStateException: Response has already been committed
at
jrun.servlet.jrpp.ProxyEndpoint.resetBuffer(ProxyEndpoint.java:994)
at jrun.servlet.JRunResponse.resetBuffer(JRunResponse.java:85)
at jrun.servlet.JRunResponse.sendRedirect(JRunResponse.java:766)
at
javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletRe
sponseWrapper.java:170)
at
com.intergral.fusionreactor.filter.surrogate.g.sendRedirect(FusionReacto
rServletReponseProxy.java:245)
at
javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletRe
sponseWrapper.java:170)
at
com.intergral.fusionreactor.filter.surrogate.g.sendRedirect(FusionReacto
rServletReponseProxy.java:245)
at
edu.suny.sysadm.enrollmentmgmt.oas.actions.OasRequestProcessor.process(U
nknown Source)
at
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1164)
at
org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:415)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.b(FusionReactorFi
lter.java:376)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.c(FusionReactorFi
lter.java:254)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionRe
actorFilter.java:164)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:249
)
at
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543
)
at
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:
203)
at
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.j
ava:320)
at
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java
:428)
at
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.jav
a:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
MASKED EXCEPTION END
-----------------------------------------------------
11/09 13:15:12 error
java.lang.IllegalStateException
at
jrun.servlet.JRunResponse.getOutputStream(JRunResponse.java:180)
at
javax.servlet.ServletResponseWrapper.getOutputStream(ServletResponseWrap
per.java:136)
at
com.intergral.fusionreactor.filter.surrogate.g.getOutputStream(FusionRea
ctorServletReponseProxy.java:157)
at
com.intergral.fusionreactor.filter.surrogate.g.getWriter(FusionReactorSe
rvletReponseProxy.java:192)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.h(FusionReactorFi
lter.java:517)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.b(FusionReactorFi
lter.java:429)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.c(FusionReactorFi
lter.java:254)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionRe
actorFilter.java:164)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:249
)
at
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543
)
at
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:
203)
at
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.j
ava:320)
at
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java
:428)
at
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.jav
a:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
We have been running FR unchanged for quite some time. Never saw this error before. Of course I was assured that nothing else had changed by our developers ;).
The response you have given, gives me something to check on...
Thanks!
- Alex
-----Original Message-----
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Darren Pywell
Sent: Wednesday, November 10, 2010 6:04 AM
To: FusionReactor
Hi Alex,
Thanks,
Darren
--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To post to this group, send email to fusion...@googlegroups.com.
To unsubscribe from this group, send email to fusionreacto...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fusionreactor?hl=en.
However, when I monitor java pages somefilename.do for example, it says
"No Stream data available" is this a limitation of the current FR
version?
I am really interested in getting time to first byte for a java app...
Thanks!
- Alex
Yes, the java app is in the same instance as CFMX and Stream Metrics are enabled.
No I do not see it in the request history detail screen:
Status: Finished
Status Code: 200 OK
Request Id: 3293512
Date: 12:45:39.892 11-Jan-2011
Client Address: 208.67.34.2
Script Name: /processCreditCardPayment.do
Servlet Path: /processCreditCardPayment.do
URL: https://www.suny.edu/student/oas/processCreditCardPayment.do
Content Type: text/html
Execution Time
Execution Time: 2,953ms
Started Time: 12:45:36.939 11-Jan-2011
Finished Time: 12:45:39.892 11-Jan-2011
Stream Metrics
[No stream data available]
Flags
Is Aborted: false
Stopped: false
Soft Kill Active: false
- Alex
-----Original Message-----
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Darren Pywell
Sent: Tuesday, January 11, 2011 12:24 PM
To: FusionReactor
Hi Alex,
Happy New Year!
Cheers,
Darren
--
Here is a snippet of the jso code:
<%@ taglib uri="/tags/struts-logic" prefix="logic" %>
<%pageContext.forward("/showLogin.do");%>
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com
> [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
> Sent: Wednesday, January 12, 2011 9:01 PM
> To: fusion...@googlegroups.com
Also thanks Charlie for replying, much appreciated.
I had tried the jsp:forward, but not the pageContext.forward method
method before, but I can confirm that with a simple jsp page which does
pageContext.forward, this is working. I assume there's something
struts-related which is happening in the background and to check that
I'll have to get a simple struts app up and running with Jrun.
I'll get back to you as soon as I know something!
-John
--
John Hawksley
Senior Engineer
Voice: +49 7031 221 502 | Fax: +49 7031 221 524
www.intergral.com
Intergral GmbH - Schickardstr 32 - D-71034 Boeblingen - Germany
Geschaeftsfuehrer: David Tattersall, Darren Pywell
Sitz der Gesellschaft: Boeblingen
I've been working through an issue with one of our applications and have
a question.
Should the IIS Log Entry for Time-Taken match the execution time entered
in the Fusion Reactor log?
Should they be relatively close?
- Alex
I'll offer some expanded thoughts in a followup note.
In the meantime, in case someone may go looking in that field in their IIS
logs, let's note that the time-taken field is optional fields and not
enabled by default. You can turn it on (without any real overhead I've ever
noticed) in the properties for a given web site (or all sites), in the
properties for the logging.
I do recommend enabling it, because it can indeed be useful to observe the
time taken to send a file from IIS to the client, whether it's a CF file or
even static files like jpg, css, html, etc.
More in a moment.
/charlie
Quick thought:
For instance, besides the obvious time CF spends between getting the request
from IIS and generating content to return to it, we could wonder if IIS is
tracking the time spent waiting to get the request *from IIS to CF*. More on
that in a moment. Then there's the question of whether it tracks the time
spent not just "between it and CF" but also "between it and the browser". We
do need to know that to compare the two correctly.
That thought alone may help you as you consider this, Alex. In case you or
others are interested in more, here goes...
Expanded thoughts:
First, there's that question both of whether the IIS log tracks time spent
waiting to get **from IIS to CF**, which could include time spent waiting in
CF's "queue". I suspect it does, so it could then be higher than FR's
request logs, when you have a condition where there was indeed CF queuing.
(And perhaps the same may be true of FR's queuing, but I just don't use that
enough to even hesitate a guess. Perhaps the FR folks can chime in.)
Then there's also the question of time spent sending the generated content
*down to the browser*. That's yet a different factor that could influence
how to interpret the two logs. Note that this time is different to the time
spent generating the content on the server and make it available to be sent
down to the client. Some have likely noticed previous conversations here
where it's been observed that FR does indeed track the time not just "on the
server" but also "getting the content to the client", so it can be longer
than one may expect.
It does also mean that FR's request logs time (and even an FR running
request) can show that a request is "takes a long time" when in fact it's
technically "done" generated the content in CF and is really now just taking
time (for whatever reason) to get sent down to the client. (There was even a
fix provided by the FR folks to help detect a bug in IE when sometimes a CF
request could hang indefinitely, because the IE client sometimes never told
CF, "ok, I've got it. you can consider it finished".)
Anyway, as for this "time to the browser", I've wondered whether IIS's
time-taken also tracks that, because again it would be important to know if
it does or doesn't, in terms of comparing the two.
If anyone gets to explore/confirm stuff, it will be interesting to hear.
/charlie
> -----Original Message-----
> From: fusion...@googlegroups.com
> [mailto:fusion...@googlegroups.com] On Behalf Of charlie arehart
> Sent: Monday, February 14, 2011 12:23 PM
> To: fusion...@googlegroups.com
If I am running the latest release with the latest patch, can I assume
the IE patch is in my build (3.5.4)?
I want to eliminate any IE issues from my logs.
- Alex
For more on it, see a message in the midst a much longer thread (the link
below should jump to Darren's message with details on the patch)
http://groups.google.com/group/fusionreactor/msg/2e57c5fbab5305b1