Help Interpreting FusionReactor Stats

180 views
Skip to first unread message

Glenn Seseske

unread,
Oct 12, 2010, 11:20:57 AM10/12/10
to FusionReactor
We have been having an issue when under peak load over the last couple
weeks. Upon installing Fusion Reactor we were able to optimize some
trouble spots (long running queries, large pages, etc) and while this
definitely helped, we are still having hundreds of requests queued
during peak load and average page load times in excess of 6 seconds
(with some pages taking 20 seconds or more) - almost none of this time
is associated with DB query execution.

To give a little background, we are running a single server with CF8
Standard and SQL Server 2005 (yes, on the same box). It's running a
32-bit version of Windows Server 2003 with a single quad-core CPU and
4GB of RAM (the JVM has a maximum heap of 768MB and max perm of
256MB). I'm expecting people to respond with the simple answer of
throwing more hardware at the problem, and we do have plans to upgrade
in the near future, but I want to have an understanding of exactly
where the bottleneck is occurring so that we can spec any new hardware
appropriately.

So, getting to the point, here's a quick analysis of what I'm seeing
in FusionReactor, along with a screen cap of some of the data and a
Crash Protection email with the running threads. We get huge spikes in
traffic (and yes this is legit traffic, not bots or spiders) for short
periods of time. Our server can handle the spikes up to about 40 page
requests per second. After that, it starts queuing them and goes into
this weird "thrashing" cycle, where the CPU will work for about 30
seconds or less, and then "give up" for 10-20 seconds - not
completely, but enough to see that something is holding it up. This
process repeats until the server eventually recovers on it's own due
to traffic dying down. While there are a lot of JDBC queries occurring
during this process, they are averaging less than 20ms, so it doesn't
seem to be DB processing that is holding things up. One potential
culprit is memory use/management. During this traffic spike the JVM
reallocates and garbage collects often, however, I can't tell if this
is "too often" or appropriate given the traffic. We won't receive this
kind of load again until next weekend, so it's hard to use the trial
and error approach of making a change and seeing what kind of effect
it has.

We currently have two theories:

Theory One: We have about 3000 HTML files (generated from dynamic
pages as a caching mechanism) that are being cfincluded. At some point
(well before we had a problem) we realized that the cfinclude tag
unnecessarily parsed, compiled, and stored these in memory, but we
thought nothing of it, as it performed fine. However, it is now
apparent that these pages may be constantly swapped in and out of the
template cache (which is still set to the default at 1024). Perhaps
this swapping is taxing the memory management of the JVM, causing the
CPU to take a break while memory is reallocated/gc'd. While raising
the template cache may solve this issue, we are in the process of
rearchitecting this altogether to serve the HTML files directly and
not involve CF.

Theory Two: There is a bottleneck with the JDBC connection pooling. As
you will see in the CP email, there are 35 threads (out of 75 active,
the max we have set) currently waiting on the
"coldfusion.server.j2ee.sql.pool.JDBCPool.checkOut(JDBCPool.java:346)"
method. The number of JDBC requests seem to be relatively high at
almost 800 per second. Each page request has about 10-20 queries,
although most take only a millisecond or two to execute. We are
currently using the Macromedia JDBC driver (wrapped or unwrapped makes
no difference). I've seen anecdotal evidence that the Microsoft driver
may perform better, but have no hard evidence to suggest that this
would fix the issue.


FusionReactor Screencap: http://www.vsxp.com/images/fr_screencap.jpg

FusionReacor Crash Protection Email: http://www.vsxp.com/fr_cpEmail.html


Any comments or observations would be appreciated.

Thanks,
Glenn

charlie arehart

unread,
Oct 12, 2010, 11:53:59 AM10/12/10
to fusion...@googlegroups.com
Glen, it's hard to really answer with only the info provided, but I have a few
immediate thoughts (and I'm totally with you, that often the solution is NOT
necessarily to throw more hardware at the problem, but simply to make better use of
what you have):


- with the pattern of spiking in CPU, I would agree with your theory that it may be a
memory management issue, but this isn't something that will be obvious always in FR.
Instead, look in the [cf]\runtime\logs (or [jrun]\logs, if on Multiserver). These are
logs that many never even realize exist, and generally are far more valuable than the
traditional CF logs. Search them for the phrase "outofmemory". Once you find the files
with that, look for whichever instance of that error is the first when it starts to
report many of them within a few minutes. That's key, because one kind can lead to
another. You want to know what the first it. It could be "outofmemory: heap", or
"outofmemory: out of swap space", or "outofmemory: unable to create new native
thread", and so on. Each is VERY different, and some in fact mean that you SHOULD NOT
INCREASE the heap size to solve the problem but rather may want to decrease it.

- in fact, in that you say that you run the DB server on the same machine, I would not
be at all surprised if you find that you're getting "out of swap space" (as the first
error when things go south). This happens because SQL Server could be stealing memory
that prevents CF getting what it needs/wants, resulting in that error. With you having
SO MANY queries, I would not be surprised it's happening at all. In fact, unbeknownst
to most, SQL Server is configured by default to use virtually unlimited amounts of
memory. You will likely want to limit it to some reasonable amount. Given you have 4gb
on the box, and about 1gb used by CF, perhaps limiting it to 2GB (or more if you want
to let it use virtual memory) would be wise. (This is of course one of the reasons to
move SQL Server to its own box, not so much "throwing hardware at the problem", but
just giving each cranky child its own playground. Indeed, note as well that in your
graphs, the FR CPU graph DOES in fact show that "system cpu" is quite a bit higher
than "CF CPU" when things go bad, which could well be your SQL Server, and which is of
course another reason to move SQL Server to another box.)

- as for the observation of heap use growing a bit high after this issue, note that
you may find that if you force a GC (in the FR System Metrics page's memory graph)
that it may drop, indicating merely that the JVM had not gotten around to doing a
Major GC on its own, as it was so busy. If it does drop, then the rise in memory is
nothing to worry about. If it doesn't, then something's still holding the memory
(could be any of many things), and over time that could well lead to you running out
of available heap (when it might later try to do a GC and cannot because too much is
still "really" in use.)

- I do agree that with your current design, you're possibly burdening the template
cache. Increasing it probably would be a good idea. It doesn't use much more memory as
you increase it, but admittedly we can only guess about template cache issues in CF
Standard. If you had Enterprise, the Server Monitor does provide useful information on
the template cache which may guide you. I understand you're planning a rewrite, but if
nothing else above has helped, it's worth a try just temporarily increasing the
template cache size until then.

- with you having 800 queries per second, I would propose that for those being tracked
by FR (where the DSN is wrapped), you may want to disable the "query location" option
(in the JDBC settings page for FR)? With it on (the default), FR is having to stack
trace each request at each query to get that info, which in such very high volume of
queries per second can be a burden. I see that you said that things don't seem to
matter whether you do or don't used a wrap DSN. Still, you may have many different
issues, and solving one may simply cause another to become more clear. I'd simply
propose that (especially if you never really use the query location info) that you
disable it just as a precaution.

Hope those are enough to get you started. Obviously, some problems are not solved by
information provided in FusionReactor. The runtime (or jrun) logs are often vital.

I will add that if you want still more help in resolving things, this is where the
Consulting Services offered by the FusionReactor team (and their partners, including
myself) can be helpful. For more info, see http://www.cfconsultant.com. But to be
clear, you will find that this list is also a great resource, at least as long as the
discussion is well-suited to email. :-) And obviously, others benefit, either now or
if they find the discussion in the future.

Let us know how things go with the above.

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 11:21 AM
> To: FusionReactor
> Subject: [fusionreactor] Help Interpreting FusionReactor Stats
>
> We have been having an issue when under peak load over the last couple
> weeks. Upon installing Fusion Reactor we were able to optimize some
> trouble spots (long running queries, large pages, etc) and while this

<snip>

charlie arehart

unread,
Oct 12, 2010, 12:19:00 PM10/12/10
to fusion...@googlegroups.com
Following up on my last note, I had meant to add one more thought, with respect to
interpreting the CP alerts: you refer it and wonder if there may be significance to
the number that show being in the JDBC pool checkout process.

Well, one thing I noticed (in the screenshot of the alert email which you offered at
http://www.vsxp.com/fr_cpEmail.html, for those who want to follow along), is that if
you look carefully at the list of requests at the top of the email, you will see that
other than the first request, all the others are within 2 seconds of the time of the
alert. So it's not like they were waiting a long time. Indeed, if you then look at the
list of requests that is repeated at the bottom of the email, with more details,
presented in descending order of execution time, you can see that all but about 11 of
them were under 1 second at that moment the alert fired.

So I'm not saying you don't have a pool problem, just that all those requests "running
at once" at the time of the alert wasn't so clearly problematic. If they took still
longer, then perhaps it would be an issue. This is where FR's request log can help
you, as you can see details of each request which can confirm how long they took. You
can also use the FR resource log to tell you if the avg request time (for all
completed requests) was going up or down in this and around this period.

Just some more thoughts, and more ways that FR can help when faced with this sort of
challenge.

/charlie

PS Though you may not have a pool problem, per se, and you muse about using a new
driver, I would propose as well that you could have another burdensome draw on
resources that you don't realize and may not even want. Check your Client Variables
page in the CF Admin. Do you have any databases listed there as default client
variable storage options (one may even be set to be the default)? If so, if you click
the link for the datasource and view its details, is "disable global variable updates"
left unchecked (its default)? If so, then even if your code "NEVER uses client
variables", but your application.cfm or cfc does enable ClientManagement, then every
request is updating the client variable store. And worse, for those spiders and bots,
it's creating NEW records. Then the purge mechanism comes along every 67 minutes by
default (as set on the client variables page) and does a purge of those client
variable stores (including the registry), which could be its own source of problems.
These are things that have been discussed here before. Again, if you don't want to dig
through the emails, this is again where some consulting help can quickly identify and
explain things.

> -----Original Message-----
> From: charlie arehart [mailto:charli...@carehart.org]
> Sent: Tuesday, October 12, 2010 11:54 AM
> To: 'fusion...@googlegroups.com'
> Subject: RE: [fusionreactor] Help Interpreting FusionReactor Stats
>
> Glen, it's hard to really answer with only the info provided, but I have a few
> immediate thoughts (and I'm totally with you, that often the solution is NOT
> necessarily to throw more hardware at the problem, but simply to make better use
> of what you have):
>
>

<snip>


Glenn Seseske

unread,
Oct 12, 2010, 12:59:24 PM10/12/10
to FusionReactor
Thanks for the prompt response Charlie.

To address a couple comments from your 2 previous posts:

- I was one step ahead of you on the JRun logs...the word "memory" is
not found in any of them. I should also add that we had upped the max
heap to 768 from 512 and the max perm to 256 from 192. The server
*seemed* to perform a little better during peak, but it may have been
my imagination or the traffic pattern may haven been slightly
different.

- SQL Server never uses more than 1.3GB of RAM and the entire machine
usage never goes over 3GB.

- Per the Client Variables...we are not using them, but they were set
to be stored in the cookies. We disabled them completely yesterday,
but whether or not this does anything will remain to be seen.

One other thing to note: You can see from the CP alert that the stream
was opened at about 1000ms, and is currently at 20250ms. We don't
cfflush anything (the page is done executing at 1000ms), so it's clear
that it's streaming out very slow (hanging on the
java.net.SocketOutputStream.socketWrite0 method). Other requests at
this time experience the same traits where they take anywhere from 1-8-
seconds to process (clearly too long) but also take another 2-10
seconds to stream to the client (even when the client appears to be on
a broadband connection). These could probably be discounted if there
were only a few, but the behavior seems to plague most of the requests
during this time. I though this might point to a network bottleneck,
but we don't appear to be anywhere near our limit.

I have a feeling a lot of this traffic is people constantly hitting
refresh in their browser (waiting for a report to update), and I'm
contemplating using your "limiter" function (mentioned here:
http://www.carehart.org/blog/client/index.cfm/2010/5/21/throttling_by_ip_address)
but thought this may add some additional overhead. Would you
anticipate it adding to the problem (assuming it doesn't actually
limit any requests)?

Glenn



On Oct 12, 12:19 pm, "charlie arehart" <charlie_li...@carehart.org>
wrote:
> Following up on my last note, I had meant to add one more thought, with respect to
> interpreting the CP alerts: you refer it and wonder if there may be significance to
> the number that show being in the JDBC pool checkout process.
>
> Well, one thing I noticed (in the screenshot of the alert email which you offered athttp://www.vsxp.com/fr_cpEmail.html, for those who want to follow along), is that if

charlie arehart

unread,
Oct 12, 2010, 1:52:13 PM10/12/10
to fusion...@googlegroups.com
Good stuff. I have only a minute to write before I have an engagement.

As for client vars, if you changed them from using a DB to using cookies, that will
have stopped them being added/read/updated from then on, but it does NOT stop CF from
clearing out the client variables wherever they used to live (though if you remove the
DBs from the list of DSNs in the client vars page), then it will at least stop looking
in them. But it will still look at the registry. What was your default before?

Also, note that this page sets only the default: if your code anywhere uses
ClientStorage (on CFApplication or as a this. Variable in application.cfc), then that
code would override this default.

As for the CP alert, do note that you are looking at only 1 request that's running
long. My point was that the other 74 were not, and were less than 2 seconds "long"
(and most less than 1 second). Just saying, don't miss the forest for the tree.

But to that one, if it were to hang "forever", then there is a fix that FR adds (by
way of an enableable option in an FR config file) that would watch for request hanging
in the jrpp buffered output state, as that one is. But if it's only hanging for a
brief time (and only one or a few), then it may not be worth worrying about. I or
someone can provide more info on that if needed.

As for my limiter function, I've not sought to test or optimize it. For me, the value
it adds is so far above any cost it may have. But to be clear, I've not noticed it add
any measurable impact at all on my own server. Then again, I'm not getting hundreds of
requests per second. I'm sure there may be ways to optimize it, and since it's called
so often, even modest gains could be valuable. I'd recommend you test it first in your
environment if that's a concern, before spending time trying to optimize it, though.
:-)

Gotta run now.

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske

Glenn Seseske

unread,
Oct 12, 2010, 2:30:18 PM10/12/10
to FusionReactor
The client variables were always set to use cookies (there were no
datasources configured there), but we changed them to "None"
yesterday. I don't think the cookie setting would have been a problem,
but since we aren't using them we might as well disable them. We are
not using the clientManagement or sessionManagement attributes of the
cfapplication tag, and as I understand it they will both default to
"No".

Your point about the majority of the requests in the CP alert having
just started is a good one, and something I overlooked, however,
you'll notice the timestamp (~17:00) is very close to a CPU peak on
the graph (http://www.vsxp.com/images/fr_screencap.jpg) where CF seems
to be actively working to process the requests (as opposed to one of
the CPU troughs) . The average execution time of completed requests is
much lower during a CPU peak (1-2 seconds) than during a CPU trough
(6-8 seconds). This seems to be the ultimate question...why is the CPU
taking a break?

Glenn


On Oct 12, 1:52 pm, "charlie arehart" <charlie_li...@carehart.org>
wrote:
> >http://www.carehart.org/blog/client/index.cfm/2010/5/21/throttling_by...

charlie arehart

unread,
Oct 12, 2010, 4:44:57 PM10/12/10
to fusion...@googlegroups.com
But again, do you see that the "cpu peak" is mostly non-CF cpu (the white part), while
CF at that moment is only about 20% (the blue part) at 17:00?

It could be that at these moments, SQL Server is the one having fits. You really want
to look at what's going on in there, as well. You could be having blocking problems
(read queries can't run because something writing a lot and locking the table, or
vice-versa), or perhaps you have SQL Server doing something odd like doing a backup or
a index rebuild. I realize those may seem like things that can't happen frequently.
They're just some guesses.

You should look at least at the Windows Event viewer on the server to see if it show
anything odd, then look in any SQL Server logs, then finally use a tool to monitor SQL
Server (whether those built-in, or easily added for free, or separately purchased.) I
wrote about one set of free SQL Server monitoring features that MS released a couple
of years ago:

http://www.carehart.org/blog/client/index.cfm/2007/4/10/sql_server_2005_performance_da
shboard

And I list other SQL Server monitoring tools here:

http://www.carehart.org/cf411/#dbmon

Hope that helps.

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 2:30 PM
> To: FusionReactor
> Subject: [fusionreactor] Re: Help Interpreting FusionReactor Stats
>

Glenn Seseske

unread,
Oct 12, 2010, 5:32:41 PM10/12/10
to FusionReactor
Thanks Charlie. Yes, the other CPU usage is from SQL Server. I have
reviewed the Windows Event logs and SQL Server logs and found nothing
of interest. I also suspected locks on the DB, and was monitoring
these (as well as a few other things) in Windows Performance Monitor,
and while there are some locks, they don't seem like they could be
blocking all the traffic that is getting queued.

I'll look into using the SQL tools you mention during our peak load
next weekend, but I guess my best bet in the meantime (barring any
sudden epiphanies), is to finish cleaning up (i.e. removing from CF)
all of those cfincluded HTML files that are being parsed, compiled,
and loaded into memory, and rework some of those queries that are part
of the 800/sec. If I can knock out a couple from each page request I
may be able to knock off a couple hundred from the total/sec.

Glenn


On Oct 12, 4:44 pm, "charlie arehart" <charlie_li...@carehart.org>
wrote:
> But again, do you see that the "cpu peak" is mostly non-CF cpu (the white part), while
> CF at that moment is only about 20% (the blue part) at 17:00?
>
> It could be that at these moments, SQL Server is the one having fits. You really want
> to look at what's going on in there, as well. You could be having blocking problems
> (read queries can't run because something writing a lot and locking the table, or
> vice-versa), or perhaps you have SQL Server doing something odd like doing a backup or
> a index rebuild. I realize those may seem like things that can't happen frequently.
> They're just some guesses.
>
> You should look at least at the Windows Event viewer on the server to see if it show
> anything odd, then look in any SQL Server logs, then finally use a tool to monitor SQL
> Server (whether those built-in, or easily added for free, or separately purchased.) I
> wrote about one set of free SQL Server monitoring features that MS released a couple
> of years ago:
>
> http://www.carehart.org/blog/client/index.cfm/2007/4/10/sql_server_20...

charlie arehart

unread,
Oct 12, 2010, 8:40:39 PM10/12/10
to fusion...@googlegroups.com
Will look forward to seeing what happens then. :-)

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On
> Behalf Of Glenn Seseske
> Sent: Tuesday, October 12, 2010 5:33 PM
> To: FusionReactor
> Subject: [fusionreactor] Re: Help Interpreting FusionReactor Stats
>

Glenn Seseske

unread,
Oct 18, 2010, 10:10:49 AM10/18/10
to FusionReactor
Despite all of our optimizations, the problem occurred again. We
didn't get quite as much traffic, and response times were down
slightly, but CF still got stuck with 75 concurrent requests (our max)
and starting queuing up beyond that. Clearly, at some point a server
can only handle a certain amount of load, but it's just strange that
we hit a brick wall, rather than a gradual increase, especially when
the CPU stays under 70%. Perhaps this problem is beyond the scope of
this forum, but I'll give it one last shot by filling you in on what
we saw yesterday.

http://www.vsxp.com/images/fr_2010_10_17.jpg

The first screencap is of FusionReactor (similar to the original
post). You'll see the Active Requests start getting hung up around 45+
page requests per second. This is consistent with previous weeks.
Memory usage looks much better than last week after our optimizations.
Total DB queries were down slightly after our optimizations, but we
still crossed over 500 JDBC requests per second. The one thing that
stands out (and I didn't notice last week), is that the Average
Execution Time of Completed JDBC Requests more than doubles during the
hung period. I didn't notice this at first because the average
response time is minimal (less than 20ms), but the average response
time outside of this period is less than 5ms.

http://www.vsxp.com/images/perfmon_2010_10_17.jpg

The second screencap (which is much hard to look at, I apologize for
the overload of information), is of Windows Performance Monitor when
the hang first starts to occur. The first thing you'll notice is a
drop, and then spike of CF page requests per second (light blue).
Immediately after this, SQL batch requests and SQL compilations drop
to 0 for a second. Then SQL Server locks start to occur and CF just
keeps opening new DB connections. What's strange is that the total
batch requests in SQL doesn't go up by much, although SQL seems to be
doing more compilations. SQL's buffer cache hit ratio is 99%, so there
don't seem to be any memory issues here. The concept of a "rogue"
query causing this issue just doesn't seem likely, as it always seems
to be traffic based once we pass 45 page requests per second.

Clearly I could drill down further on what's happening at the DB (with
the tools Charlie mentioned), but it's hard to tell what is a symptom
of the problem, versus the cause of the problem. I could drill down on
the SQL locks, only to find the problem was with the macromedia JDBC
driver, and the Microsoft JDBC driver solves the issue. The main
problem in troubleshooting this is that it only occurs once or twice a
week, and we have thus far been unable to cause the issue with load
testing software (due to a lack of testing machines).

My current plan is to rearchitect some of the pages in order to bring
down the number of queries and reduce SQL recompilations (and
hopefully locks) - although I have no idea if this will solve the
problem. I will also monitor SQL more closely next weekend (although
we may not get enough traffic to cause the issue). We are planning on
moving SQL to it's own server in the near future, but I'm not
confidence this will fix the issue either. What would your next steps
be?

Thanks,
Glenn


On Oct 12, 8:40 pm, "charlie arehart" <charlie_li...@carehart.org>
wrote:

Glenn Seseske

unread,
Oct 21, 2010, 7:37:30 AM10/21/10
to FusionReactor
We are now able to replicate the problem during load testing and have
narrowed it down to an issue with DB connections from CF. Even a page
with almost no CF code except for a single cfquery tag (with a select
query that executes in 1-2ms) will cause the server to hang after
about a minute of 40+ page requests per second. During this time the
CPU usage peaks around 20% (roughly 12% CF, 8% SQL Server). Both
memory and disk usage seem negligible. Other applications (non CF) can
query the SQL DB during this time, and can be served through the same
IIS app pool with no issue.

We tried changing all available settings on the CF datasource
(maintain connections, limit connections, etc), but nothing improved
(and usually got worse). We tried changing the native SQL driver to
the newest version of the Microsoft JDBC driver (sqljdbc4.jar), but
results were largely the same. Again the majority of the threads
during this time seem hung on
"coldfusion.server.j2ee.sql.pool.JDBCPool.checkOut(JDBCPool.java:346)"
with a few hung on
"coldfusion.server.j2ee.pool.ObjectPool.checkIn(ObjectPool.java:263)",
so the problem appears to be with the connection pooling. As far as we
know, we have no limits set on connections in CF or SQL Server.

What could be causing problems with the connection pooling when the
server resources do not appear taxed? Is this simply a limit of a
single CF server, roughly 40 page requests (with DB queries) per
second max?


P.S.

Potentially similar issues are discussed here, but didn't not help in
solving our issue:

http://forums.adobe.com/thread/673262?decorator=print&displayFullThread=true

and here:

http://kb2.adobe.com/cps/187/tn_18772.html (however we have SQL
Server on the same box, and the 127.0.0.1 address is already in the
hosts file)

Darren Pywell

unread,
Oct 21, 2010, 12:50:22 PM10/21/10
to FusionReactor
Hi Glenn,

The JDBC pooling is pretty complex and I'll be honest that there are
probably some elements of this that I still don't understand but I
hope that what I know will help:

The pool is created per Data Source with a limited size which is
control via the Max Pooled Statements on the Data Sources Advanced
Settings. This is the Maximum number of entries in the pool and if the
pool is exhausted then a request must wait for a pool entry to become
available before it can proceed or eventually it times out (default 30
seconds).

The pool is implemented by the ObjectPool class that you list which
has locking checkin and checkout methods to make sure to make sure
that only one thread can affect the pool at a time.

The checkout method on the pool tries to checkout a pool entry for up
to the number of seconds specified in Login Timeout (sec) setting on
the Data Sources Advanced Settings. If the checkout can't get an entry
from the pool it waits and releases the pool lock so that checkin's
can occur. This does mean however that lots of checkout can be waiting
for a pool element to become available and when one does become
available any of the waiting checkouts may get it... so you could have
checkouts that wait a very long time. No guarantee on which one of the
waiting checkouts wins!

When all of your CF request threads are waiting for a checkout or
checkin then you have effectively locked up the CF engine. This
shouldn't happen because the checkin's should return entries to the
pool but the checkin's also have to wait "some time" for the checkout
threads to do the checkout process. Also there is a piece of magic
here that I've never been able to understand but it seems sometimes
like the checkin don't always let the checkouts know that an entry has
been returned to the pool until a certain number of checkins have
occurred. The exact reason for this I'm still trying to track down.

So to resolve the problem typically you have to create a larger pool
and you have to set the Login timeout down. You could try doubling
your pool size and lowering the connection timeout to just a few
seconds (perhaps 5). I think it is 30 seconds be default and see it
that helps. Also make sure that the number of connections that the DB
is willing to accept is at least as high as the pool size and then
some otherwise connections may hang there too.

Hope that helps,
Darren
>      http://forums.adobe.com/thread/673262?decorator=print&displayFullThre...
>
> and here:
>
>      http://kb2.adobe.com/cps/187/tn_18772.html(however we have SQL

Glenn Seseske

unread,
Oct 21, 2010, 1:30:57 PM10/21/10
to FusionReactor
Thanks for insight Darren, and thanks again to Charlie for giving our
original (and perhaps too verbose) post some attention. I actually
just solved this issue about an hour ago and it turns out I took a
LONG way to a quick solution.

The problem was additional sql "batch requests" being sent with every
query we executed. It turns out the default CF datasource settings
will cause these to be added (one for Long Text Buffer, one for Blob
Buffer, and one for Validation Query). Removing values for these
settings removes the batch queries. It seems the Validation Query
setting is a bug that some users are experiencing (http://
www.webapper.com/blog/index.php/2008/01/28/incorrect-syntax-near-0-dont-shoot-the-messenger/),
however we were not seeing the exception they mention in our logs.

In addition to the ironing out the hanging condition, we also picked
up some CPU headroom (went from about 15-20% during our "simple" load
test to 5-10%).

The only way we got to this solution was by running the same load
against a near-identical ASP.NET page and monitoring counters and
analyzing trace results in SQL. .NET was able to outperform CF by 3-4
fold, but as soon as we noticed the additional batch requests being
generated by CF, we knew that was the issue. However, it seems there
is still an additional batch request being initiated by CF with every
query (making the CF batch requests double the .NET count). The trace
shows an extra "SET TRANSACTION ISOLATION LEVEL READ COMMITTED" for
each request. I think I understand what this does (and I know I've
seen it before), but if .NET doesn't think it's needed perhaps I can
removed it from the CF side as well (if that's even possible). So far
I've had no luck. Regardless, our hanging condition appears to be
gone.

Thanks,
Glenn

DeMarco, Alex

unread,
Nov 9, 2010, 1:28:39 PM11/9/10
to fusion...@googlegroups.com

Lately we have been seeing this error appear in our Jrun Log. Not much
on the server side has changed. I am wondering if this is some Fusion
Reactor error?

Any ideas?

Thanks!

11/09 13:15:08 metrics Web threads (busy/total): 4/37 Sessions: 2025
Total Memory=508416 Free=113756
MASKED EXCEPTION START
---------------------------------------------------
java.lang.IllegalStateException: Response has already been committed
at
jrun.servlet.jrpp.ProxyEndpoint.resetBuffer(ProxyEndpoint.java:994)
at jrun.servlet.JRunResponse.resetBuffer(JRunResponse.java:85)
at jrun.servlet.JRunResponse.sendRedirect(JRunResponse.java:766)
at
javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletRe
sponseWrapper.java:170)
at
com.intergral.fusionreactor.filter.surrogate.g.sendRedirect(FusionReacto
rServletReponseProxy.java:245)
at
javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletRe
sponseWrapper.java:170)
at
com.intergral.fusionreactor.filter.surrogate.g.sendRedirect(FusionReacto
rServletReponseProxy.java:245)
at
edu.suny.sysadm.enrollmentmgmt.oas.actions.OasRequestProcessor.process(U
nknown Source)
at
org.apache.struts.action.ActionServlet.process(ActionServlet.java:1164)
at
org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:415)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.b(FusionReactorFi
lter.java:376)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.c(FusionReactorFi
lter.java:254)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionRe
actorFilter.java:164)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:249
)
at
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543
)
at
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:
203)
at
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.j
ava:320)
at
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java
:428)
at
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.jav
a:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
MASKED EXCEPTION END
-----------------------------------------------------
11/09 13:15:12 error
java.lang.IllegalStateException
at
jrun.servlet.JRunResponse.getOutputStream(JRunResponse.java:180)
at
javax.servlet.ServletResponseWrapper.getOutputStream(ServletResponseWrap
per.java:136)
at
com.intergral.fusionreactor.filter.surrogate.g.getOutputStream(FusionRea
ctorServletReponseProxy.java:157)
at
com.intergral.fusionreactor.filter.surrogate.g.getWriter(FusionReactorSe
rvletReponseProxy.java:192)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.h(FusionReactorFi
lter.java:517)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.b(FusionReactorFi
lter.java:429)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.c(FusionReactorFi
lter.java:254)
at
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionRe
actorFilter.java:164)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:249
)
at
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543
)
at
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:
203)
at
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.j
ava:320)
at
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java
:428)
at
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.jav
a:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

Darren Pywell

unread,
Nov 10, 2010, 6:04:03 AM11/10/10
to FusionReactor
Hi Alex,

Have you installed a new version of FR and now this problem is
occurring?

Although sendRedirect() appears in FusionReactor in the stack trace we
don't ever call sendRedirect() inside FusionReactor. It's only
implemented in our FusionReactorServletReponseProxy and Wrapper for
completeness. This means that when the underlying
OasRequestProcessor.process() called sendRedirect() it passes straight
through FR and on into jrun.servlet.JRunResponse.sendRedirect(). When
sendRedirect() tries to reset the buffer of data to send to the client
it cannot because the buffer has already been committed to the client
so an IllegalStateException is raised. So that suggests that something
in the application already sends data and then attempts to send a
redirect. Has any kind of redirect been added to the code.

Thanks,
Darren

DeMarco, Alex

unread,
Nov 10, 2010, 8:36:38 AM11/10/10
to fusion...@googlegroups.com
Thank you Darren,

We have been running FR unchanged for quite some time. Never saw this error before. Of course I was assured that nothing else had changed by our developers ;).

The response you have given, gives me something to check on...

Thanks!

- Alex

-----Original Message-----
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Darren Pywell
Sent: Wednesday, November 10, 2010 6:04 AM
To: FusionReactor

Hi Alex,

Thanks,
Darren

--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To post to this group, send email to fusion...@googlegroups.com.
To unsubscribe from this group, send email to fusionreacto...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fusionreactor?hl=en.

Darren Pywell

unread,
Nov 10, 2010, 9:11:40 AM11/10/10
to FusionReactor
Hey Alex,

Sure thing. You know us... if it is our problem... we want to fix it.
If it turns out not to be our problem but we helped; that's also good!

Many Thanks,
Darren

DeMarco, Alex

unread,
Jan 11, 2011, 11:58:01 AM1/11/11
to fusion...@googlegroups.com
When I monitor CFMX requests in FR the Stream Metrics section is
populated with information such as Time to First Byte etc..

However, when I monitor java pages somefilename.do for example, it says
"No Stream data available" is this a limitation of the current FR
version?

I am really interested in getting time to first byte for a java app...

Thanks!

- Alex

Darren Pywell

unread,
Jan 11, 2011, 12:23:36 PM1/11/11
to FusionReactor
Hi Alex,

Happy New Year!

Is this is the same instance as the CFMX code? If not you might want
to check the Request Settings->Stream Metrics to see if they are
enabled.

This shouldn't be a limitation in FR. Do you see this data filled in
on requests in the Request History?

Cheers,
Darren

DeMarco, Alex

unread,
Jan 11, 2011, 12:46:50 PM1/11/11
to fusion...@googlegroups.com
Happy N-Year to you folks as well.. Looking forward to FR 4.x in 2011!

Yes, the java app is in the same instance as CFMX and Stream Metrics are enabled.

No I do not see it in the request history detail screen:
Status: Finished
Status Code: 200 OK
Request Id: 3293512
Date: 12:45:39.892 11-Jan-2011
Client Address: 208.67.34.2
Script Name: /processCreditCardPayment.do
Servlet Path: /processCreditCardPayment.do
URL: https://www.suny.edu/student/oas/processCreditCardPayment.do
Content Type: text/html
Execution Time
Execution Time: 2,953ms
Started Time: 12:45:36.939 11-Jan-2011
Finished Time: 12:45:39.892 11-Jan-2011
Stream Metrics
[No stream data available]
Flags
Is Aborted: false
Stopped: false
Soft Kill Active: false

- Alex

-----Original Message-----
From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Darren Pywell
Sent: Tuesday, January 11, 2011 12:24 PM
To: FusionReactor

Hi Alex,

Happy New Year!

Cheers,
Darren

--

John Hawksley (Fusion Team)

unread,
Jan 12, 2011, 12:25:24 PM1/12/11
to FusionReactor
Hi Alex,

We suspect the JSP pages are doing jsp:forward or jsp:include. A lot
of this handling has been refactored in FusionReactor 3.5.4, which is
a downloadable patch:

http://www.fusion-reactor.com/support/kb/FRS-243.cfm

Could you apply that patch and let us know if the problem persists?
The patch was issued to solve a JDBC issue but the fixes incorporated
for that might also work for you too. We've checked jsp forwards with
the same (fixed) code as is in 3.5.4 and we couldn't replicate the
issue.

If so, can you confirm your app is actually doing jsp:forward and/or
includes?

Many thanks,
-John

DeMarco, Alex

unread,
Jan 12, 2011, 3:34:00 PM1/12/11
to FusionReactor
I will apply the patch and let you know.
 
thanks!
 
- Alex


From: fusion...@googlegroups.com on behalf of John Hawksley (Fusion Team)
Sent: Wed 1/12/2011 12:25 PM

DeMarco, Alex

unread,
Jan 12, 2011, 9:00:58 PM1/12/11
to fusion...@googlegroups.com
Sorry to say, this did not solve my issue.

Here is a snippet of the jso code:

<%@ taglib uri="/tags/struts-logic" prefix="logic" %>
<%pageContext.forward("/showLogin.do");%>

charlie arehart

unread,
Jan 12, 2011, 9:26:25 PM1/12/11
to fusion...@googlegroups.com
But at least you are confirming that you're doing a forward, which confirms
John's suspicion. He'd noted that the patch addressed some known problems.
Now that they know that it is what underlies your problem, perhaps he may be
able to offer more.

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com
> [mailto:fusion...@googlegroups.com] On Behalf Of DeMarco, Alex
> Sent: Wednesday, January 12, 2011 9:01 PM
> To: fusion...@googlegroups.com

John Hawksley

unread,
Jan 13, 2011, 6:32:08 AM1/13/11
to fusion...@googlegroups.com
Hi Alex,

Also thanks Charlie for replying, much appreciated.

I had tried the jsp:forward, but not the pageContext.forward method
method before, but I can confirm that with a simple jsp page which does
pageContext.forward, this is working. I assume there's something
struts-related which is happening in the background and to check that
I'll have to get a simple struts app up and running with Jrun.

I'll get back to you as soon as I know something!

-John

--
John Hawksley
Senior Engineer

Voice: +49 7031 221 502 | Fax: +49 7031 221 524
www.intergral.com

Intergral GmbH - Schickardstr 32 - D-71034 Boeblingen - Germany
Geschaeftsfuehrer: David Tattersall, Darren Pywell
Sitz der Gesellschaft: Boeblingen

David Stockton

unread,
Jan 14, 2011, 10:41:53 AM1/14/11
to FusionReactor
Hi Alex,

I'm one of the FusionReactor support engineers.
I just ran up the struts 1.3.10 example apps on my JRun4 instance with
FusionReactor installed and I see stream metrics for all pages
(including those that do a "pageContext.forward()" call). I'm
wondering if this could be an installation issue? Can you send me your
"default-web.xml" file from the SERVER-INF folder and the "web.xml"
file from the struts application WAR please?

Also, just to double check, can you confirm that stream metrics are
enabled for this instance (in FusionReactor goto Requests -> Request
Settings -> Stream Metrics and ensure it's set to "Enabled"). The
setting is independent for each instance so please ensure you're
checking the instance the struts application runs on if you have
multiple instances.

I'll contact you directly from my work email address so that you don't
have to send them over Google.

Many thanks,
David Stockton
Fusion Support Team

On Jan 13, 11:32 am, John Hawksley <john_hawks...@intergral.com>
wrote:

DeMarco, Alex

unread,
Feb 14, 2011, 11:56:40 AM2/14/11
to fusion...@googlegroups.com
Hello All,

I've been working through an issue with one of our applications and have
a question.

Should the IIS Log Entry for Time-Taken match the execution time entered
in the Fusion Reactor log?

Should they be relatively close?


- Alex

charlie arehart

unread,
Feb 14, 2011, 12:22:59 PM2/14/11
to fusion...@googlegroups.com
That's an excellent question, Alex, and one I've wondered about myself. I
don't have the answer.

I'll offer some expanded thoughts in a followup note.

In the meantime, in case someone may go looking in that field in their IIS
logs, let's note that the time-taken field is optional fields and not
enabled by default. You can turn it on (without any real overhead I've ever
noticed) in the properties for a given web site (or all sites), in the
properties for the logging.

I do recommend enabling it, because it can indeed be useful to observe the
time taken to send a file from IIS to the client, whether it's a CF file or
even static files like jpg, css, html, etc.

More in a moment.

/charlie

charlie arehart

unread,
Feb 14, 2011, 12:28:00 PM2/14/11
to fusion...@googlegroups.com
OK, so on this question of whether the IIS time-taken and FR request log
exectime fields should be the same, like I said in the last message, it's a
good question and one I've wondered about. Some might presume yes they are
the same, others that no they are not. I'm really not sure and have wished
to get a good conclusive answer, so I'll look forward to what people here
may say. :-)

Quick thought:

For instance, besides the obvious time CF spends between getting the request
from IIS and generating content to return to it, we could wonder if IIS is
tracking the time spent waiting to get the request *from IIS to CF*. More on
that in a moment. Then there's the question of whether it tracks the time
spent not just "between it and CF" but also "between it and the browser". We
do need to know that to compare the two correctly.

That thought alone may help you as you consider this, Alex. In case you or
others are interested in more, here goes...

Expanded thoughts:

First, there's that question both of whether the IIS log tracks time spent
waiting to get **from IIS to CF**, which could include time spent waiting in
CF's "queue". I suspect it does, so it could then be higher than FR's
request logs, when you have a condition where there was indeed CF queuing.
(And perhaps the same may be true of FR's queuing, but I just don't use that
enough to even hesitate a guess. Perhaps the FR folks can chime in.)

Then there's also the question of time spent sending the generated content
*down to the browser*. That's yet a different factor that could influence
how to interpret the two logs. Note that this time is different to the time
spent generating the content on the server and make it available to be sent
down to the client. Some have likely noticed previous conversations here
where it's been observed that FR does indeed track the time not just "on the
server" but also "getting the content to the client", so it can be longer
than one may expect.

It does also mean that FR's request logs time (and even an FR running
request) can show that a request is "takes a long time" when in fact it's
technically "done" generated the content in CF and is really now just taking
time (for whatever reason) to get sent down to the client. (There was even a
fix provided by the FR folks to help detect a bug in IE when sometimes a CF
request could hang indefinitely, because the IE client sometimes never told
CF, "ok, I've got it. you can consider it finished".)

Anyway, as for this "time to the browser", I've wondered whether IIS's
time-taken also tracks that, because again it would be important to know if
it does or doesn't, in terms of comparing the two.

If anyone gets to explore/confirm stuff, it will be interesting to hear.

/charlie


> -----Original Message-----
> From: fusion...@googlegroups.com
> [mailto:fusion...@googlegroups.com] On Behalf Of charlie arehart
> Sent: Monday, February 14, 2011 12:23 PM
> To: fusion...@googlegroups.com

DeMarco, Alex

unread,
Feb 14, 2011, 1:16:49 PM2/14/11
to fusion...@googlegroups.com
Good Stuff, Charlie....

If I am running the latest release with the latest patch, can I assume
the IE patch is in my build (3.5.4)?

I want to eliminate any IE issues from my logs.

- Alex

charlie arehart

unread,
Feb 15, 2011, 12:39:34 PM2/15/11
to fusion...@googlegroups.com
No, Alex, I don't think so. The patch (just the setting of a switch in the
FR config) was offered as a manually enabled option, not built-in, even in
later releases (if I recall correctly), because it is somewhat
"experimental".

For more on it, see a message in the midst a much longer thread (the link
below should jump to Darren's message with details on the patch)

http://groups.google.com/group/fusionreactor/msg/2e57c5fbab5305b1

Reply all
Reply to author
Forward
0 new messages