Hello Rudi,
A good way to find the root cause is to build up an image of what is
running at the time of and just before the outages. To do this in
FusionAnalytics you can use the timeline to go the the point in time
where an outage occurs on the default perpespective or the system
resource perspective. Place the outage in the middle of the timeline
with around 30 minutes either side. Using the right mouse click menu,
you can show problem requests (requests that didn't complete) at the
moment of the outage. If you find requests that are running (still in
executing state when the server restarted then you should look at the
details of those requests. Look at the database requests that they
have run and see if you can identify any issues. I would recommend
that you look at the log files at the time of the outage. Again you
can do that with right mouse click (which you can use to view many
related reports and perspectives). This can be a really huge help. You
want to look and see if any stack traces have been generated at this
time or if any other issues have been logged. If they have, these are
generally a really good place to look for the root cause.
If you haven't done so already I would also recommend you to install
the FR Extensions for ColdFusion (
http://www.fusion-reactor.com/fr/
plugins/frec.cfm) and turn on then Crash Protection notification, set
the crash protection to use either long running requests the number of
requests. When the Crash Protection fires it will generate a stack
trace of all running threads, which will be captured by FusionReactor
and imported into FusionAnalytics. This is really powerful because you
don't have to attend FusionReactor when you have an outage, the data
gets collected and imported for analysis after the fact. You can then
also look out different outages and see if you have threads hanging at
the same problem point. Again this can be really useful for finding
the root cause of a problem.
Hope that helps,
Darren