Lyle and I work on the same kind of projects it seems (real estate
search) so perhaps my advice is more relevant for Lyle since I have
built the exact same stuff.
The permgen size is the easier error to have because that is the one
that seems to grow worse over time. It eventually stabilizes if you
give it enough. You could also schedule an intentional railo restart
once a week to just avoid downtime in the middle of the day if you are
desparate.
If you are overriding the error page and doing custom logging, make
sure you occasionally check the context's logs for errors too since if
your custom error script fails, it is possible for the error to go
into the context's log instead.
I'm obsessed with performance optimization and validating to eliminate
errors even the less important ones generated by robots. I have seen
other companies where they have hundreds of errors throughout the day
because they don't have much validation to protect again form spam.
That makes it a lot harder to determine which errors are serious
problems.
Making sure you don't have long running requests that spike memory
usage is important as this used to crash adobe coldfusion too. I
usually break requests into multiple requests if they are going to be
running more then a few minutes. Make sure high memory usage scripts
aren't crawled by robots by adding extra security to check the ip or
referer is railo or use passwords. Make sure intense scripts don't
run at the same time as each other if possible.
Lyle and I both have to deal with importing large amounts of data from
many remote sources throughout the day, so I'd have to encourage you
to look at this import process and see if the database or CFML is
causing locks that last too long and cause other requests to queue
up. I use the database bulk insert features and only update about 30
listings on each loop to reduce the amount of time in locked state. I
also run most of my queries on a memory table (mysql memory engine)
instead of a disk based table. Memory tables update more quickly and
the disk table doesn't matter if it is slower. Its really impressive
on real estate search because you can't put an index on all of the
search fields - some queries need to be doing distance calculations or
full text search, so this gets a lot more complicated then just a
memory table, but I have a number of optimizations that combine cfml
shared memory, disk and memory tables with tons of pregenerated
information.
Also large deletes are massively slow unless you configure the
database and write different queries. A query wouldn't generate the
memory error on its own, but your other scripts waiting to run could
cause a crash when the query finishes because you'd have unusually
high load for a few seconds after the blocking query.
Another thing I built was a script to measure all the other pages
running and to store this in the server scope so I can monitor it from
another script. All of my web sites run on the same CFML app so I can
add global features like this. I can see the last 3000 requests
during an incident. This can occasionally help to discover what was
running prior to the errors and to understand where performance
problems are. I also keep track of the JVM memory state with the
error notifications using this code:
<cfset runtime =
CreateObject("java","java.lang.Runtime").getRuntime()>
<cfset freeMemory = runtime.freeMemory() / 1024 / 1024>
<cfset totalMemory = runtime.totalMemory() / 1024 / 1024>
<cfset maxMemory = runtime.maxMemory() / 1024 / 1024>
<cfoutput>
Free Allocated Memory: #Round(freeMemory)#mb<br>
Total Memory Allocated: #Round(totalMemory)#mb<br>
Max Memory Available to JVM: #Round(maxMemory)#mb<br>
</cfoutput>
I have achieved some impressive performance with a single 7200 rpm
hard drive on a sandy bridge 3.4ghz 6gb server with railo and mysql
5.5.x that costs $185/month.
http://www.sarasotaluxuryproperty.net/
I am able to use the slow hard drive because I eliminated nearly 100%
of the disk access on all requests. I do service a high amount of
hits across 100 domains. There are 250 cfml requests per minute with
most of them finishing under 100ms plus the static requests. I used
to have 15k sas drives and even intel SSDs but I found them to be a
waste of money after optimizing my code. Imagine if your 4 servers
could be reduced to 1 or 2 with lower spec hardware. It could save
you $500 to $1500 per month.
I believe I have one of the fastest real estate map searches on the
market which is largely achieved with indexing, grouping and memory
tables. You should also read about "database denormalization"
because performance requires you to have more redundant data in your
table design in order to have no joins with your performance sensitive
queries. If you rewrote the map search to query 1 table only and
that table was small width (no long text fields) and resided in
memory, you'd see 10 times performance improvement on your maps. The
listing lookups are fast because they rely on the primary key index,
but the map search needs ram caching to go fast. A fully cached
innodb performance isn't fast enough. The server is in Tampa,
Florida.
example search:
http://www.sarasotaluxuryproperty.net/z/_a/listing/search-form
You may want to look into the things I have done as a way to reduce
costs / improve response time on your sites.
If you have any need for additional assistance or want to perhaps see
if there is any interest in trading our services, you could contact me
through my web site. It's not that easy to find affordable
coldfusion experts and I've been doing this about 8 years now.
Bruce Kirkpatrick
http://www.realtyontop.com/
On Feb 9, 8:22 pm, Lyle Karstensen <
l...@realistiq.com> wrote:
> I want to put in my .02 since I have been dealing with this also. Make sure you are not getting a large number of errors. I have been dealing with Out of Memory errors again recently and found that I had a page that I had no idea was generating errors. I wrote a simple chunk of code in my Application.cfc onError function that emails me the details of any error that occurs on my sites. I had NO IDEA that this page was generating an error and a quite a few others. Now my out of memory errors are gone. I have been running my sites on Railo for about a year now and I can tell you without a doubt it is stable. Our servers process more than 2 million requests a day across 4 servers and had no issues for more than 6 months. Then after an upgrade recently we started getting memory leaks. I have found that if your site generates errors Java will not cleanup after that code error and you will run out of memory. Especially if you are working with so little memory you just have no room for error.
>
> Lyle Karstensen Jr.
> Chief Executive Officer (CEO)
>
> Phone:
702.940.4200
> Fax:
702.940.4201
> Direct:
702.932.8200
> Cell:
702.683.3666
> Email:
l...@realistiq.com