Hey Guys,
During navigation, I found a great need for present
web development. Just wanted to share it.
-Pavan
When your Ruby On Rails Website gets famous you're going to wish you
implemented proper caching. Are you worried? Maybe just a little?
This tutorial is going to show everything you need to know to use
Caching in your Rails applications, so when you get digg'd or
slashdot'd you won't be left begging your hosting provider for more
CPU processing power.
Since there are so many different types of caching, I'm going to split
this up into several blog entries. Each one will build on the
previous, talking about more complex types of caching and how to
implement them. We'll even discuss some advanced caching plugins
people have written for customized caching.
Today we're going to dive into the FASTEST rails caching mechanism,
page caching!
Table of Contents
Why for art thou caching?
Configuration
Page Caching
Page caching with pagination
Cleaning up your cache
Sweeping up your mess
Playing with Apache/Lighttpd
Moving your cache
Clearing out your whole/partial cache
Advanced page caching techniques
Testing your page caching
Conclusion
Why for art thou caching?
(Feel free to skip this if you're a l33t hax0r)
Ruby is what we call an "Interpreted Programming Language" (as you
probably already know). What this means is that your code does not get
translated into machine code (the language your computer talks) until
someone actually runs it.
If you're a PHP developer, you're probably saying "No Duh!" about now.
PHP is also an "Interpreted Language". However, Java code on the other
hand needs to be compiled before it can be executed.
Unfortunately this means that every time someone surfs onto your Ruby
on Rails website, your code gets read and processed that instant. As
you can probably imagine, handling more than 100 requests a second can
take great deal of processor power. So how can we speed things up?
Caching!
Caching, in the web application world, is the art of taking a
processed web page (or part of a webpage), and storing it in a
temporary location. If another user requests this same webpage, then
we can serve up the cached version.
Loading up a cached webpage can not only save us from having to do ANY
database queries, it can even allow us to serve up websites without
touching our Ruby on Rails Server. Sounds kinda magical doesn't it?
Keep on reading for the good stuff.
Before we get our feet wet, there's one small configuration step you
need to take..
Configuration
There's only one thing you'll need to do to start playing with
caching, and this is only needed if you're in development mode. Look
for the following line and change it to true in your /config/
environments/development.rb:
config.action_controller.perform_caching
= true
Normally you probably don't want to bother with caching in development
mode, but we want try it out already!
Page Caching
Page caching is the FASTEST Rails caching mechanism, so you should do
it if at all possible. Where should you use page caching?
If your page is the same for all users.
If your page is available to the public, with no authentication
needed.
If your app contains pages that meet these requirements, keep on
reading. If it doesn't, you probably should know how to use it
anyways, so keep reading!
Say we have a blog page (Imagine that!) that doesn't change very
often. The controller code for our front page might look like this:
12345
class BlogController
< ApplicationController def list Post.find(:all,
:order => "created_on desc", :limit => 10)
end ...
As you can see, our List action queries the latest 10 blog posts,
which we can then display on our webpage. If we wanted to use page
caching to speed things up, we could go into our blog controller and
do:
12345
67
class
BlogController < ApplicationController caches_page :list def
list
Post.find(:all, :order => "created_on desc"
, :limit => 10) end ...
The "caches_page" directive tells our application that next time the
"list" action is requested, take the resulting html, and store it in a
cached file.
If you ran this code using mongrel, the first time the page is viewed
your /logs/development.log would look like this:
12345
6
Processing BlogController
#list (for 127.0.0.1 at 2007-02-23 00:58:56) [GET] Parameters: {"
action"=>"list", "
controller"=>"blog"}SELECT *
FROM posts ORDER BY created_on LIMIT 10Rendering blog/list
Cached page: /blog/list.html (0.00000)Completed
in 0.18700 (5 reqs/sec) | Rendering: 0.10900 (58%) | DB:
0.00000 (0%) | 200 OK [
http://localhost/blog/list]
See the line where it says "Cached page: /blog/list.html". This is
telling you that the page was loaded, and the resulting html was
stored in a file located at /public/blog/list.html. If you looked in
this file you'd find plain html with no ruby code at all.
Subsequent requests to the same url will now hit this html file rather
then reloading the page. As you can imagine, loading a static html
page is much faster than loading and processing a interpreted
programming language. Like 100 times faster!
However, it is very important to note that Loading Page Cached .html
files does not invoke Rails at all! What this means is that if there
is any content that is dynamic from user to user on the page, or the
page is secure in some fashion, then you can't use page caching.
Rather you'd probably want to use action or fragment caching, which I
will cover in part 2 of this tutorial.
What if we then say in our model:
caches_page :show
Where do you think the cached page would get stored when we visited "/
blog/show/5" to show a specific blog post?
The answer is /public/blog/show/5.html
Here are a few more examples of where page caches are stored.:
12345
http://localhost:
3000/blog/list => /public/blog/list.htmlhttp://localhost:
3000/blog/edit/5 => /public/edit/5.html
http://localhost:3000/blog => /public/blog.htmlhttp
://localhost:3000/ => /public/index.htmlhttp
://localhost:3000/blog/list?page=2 => /public/blog/list.html
Hey, wait a minute, notice how above the first item is the same as the
last item. Yup, page caching is going to ignore all additional
parameters on your url.
But what if I want to cache my pagination pages?
Very interesting question, and a more interesting answer. In order to
cache your different pages, you just have to create a differently
formed url. So instead of linking "/blog/list?page=2", which wouldn't
work because caching ignores additional parameters, we would want to
link using "/blog/list/2", but instead of 2 being stored in
params[:id], we want that 2 on the end to be params[:page].
We can make this configuration change in our /config/routes.rb
12345
map.connect
'blog/list/:page', :controller => 'blog
', :action => 'list', :requirements
=> { :page => /\d+/}, :page
=> nil
With this new route defined, we can now do:
<%= link_to
"Next Page", :controller => 'blog
', :action => 'list', :page =>
2 %>
the resulting url will be "/blog/list/2". When we click this link two
great things will happen:
Rather than storing the 2 in params[:id], which is the default, the
application will store the 2 as params[:page],
The page will be cached as /public/blog/list/2.html
The moral of the story is; If you're going to use page caching, make
sure all the parameters you require are part of the URL, not after the
question mark! Many thanks to Charlie Bowman for inspiration.
Cleaning up the cache
You must be wondering, "What happens if I add another blog post and
then refresh /blog/list at this point?"
Absolutely NOTHING!!!
Well, not quite nothing. We would see the /blog/list.html cached file
which was generated a minute ago, but it won't contain our newest blog
entry.
To remove this cached file so a new one can be generated we'll need to
expire the page. To expire the two pages we listed above, we would
simply run:
12345
# This will remove /blog/list.html
expire_page(:controller => 'blog', :action =>
'list')# This will remove /blog/show/5.htmlexpire_page(
:controller => 'blog', :action => '
show', :id => 5)
We could obviously go and add this to every place where we add/edit/
remove a post, and paste in a bunch of expires, but there is a better
way!
Sweepers
Sweepers are pieces of code that automatically delete old caches when
the data on the cached page gets old. To do this, sweepers observe of
one or more of your models. When a model is added/updated/removed the
sweeper gets notified, and then runs those expire lines I listed
above.
Sweepers can be created in your controllers directory, but I think
they should be separated, which you can do by adding this line to
your /config/environment.rb.
12345
Rails::
Initializer.run do |config| # ... config.load_paths += %W(
#{RAILS_ROOT}/app/sweepers ) # ...
end
(don't forget to restart your server after you do this)
With this code, we can create an /app/sweepers directory and start
creating sweepers. So, lets jump right into it. /app/sweepers/
blog_sweeper.rb might look like this:
12345
6789101112131415161718
192021222324252627
class
BlogSweeper < ActionController::Caching::Sweeper observe Post # This
sweeper is going to keep an eye on the Post model
# If our sweeper detects that a Post was created call this def
after_create(post) expire_cache_for(post)
end # If our sweeper detects that a Post was updated call this
def after_update
(post) expire_cache_for(post) end # If our sweeper
detects that a Post was deleted call this
def after_destroy(post) expire_cache_for(post)
end private
def expire_cache_for(record) # Expire the list page now that we
posted a new blog entry expire_page(:controller
=> 'blog', :action => 'list
') # Also expire the show page, incase we just edited a blog
entry expire_page(:controller
=> 'blog', :action => 'show
', :id =>
record.id) endend
NOTE: We can call "after_save", instead of "after_create" and
"after_update" above, to dry out our code.
We then need to tell our controller when to invoke this sweeper, so
in /app/controllers/BlogController.rb:
1234
class
BlogController < ApplicationController caches_page :list, :show
cache_sweeper :blog_sweeper
, :only => [:create, :update, :destroy] ...
If we then try creating a new post we would see the following in our
logs/development.log:
12
Expired
page: /blog/list.html (0.00000)Expired page:
/blog/show/3.html (0.00000)
That's our sweeper at work!
Playing nice with Apache/Lighttpd
When deploying to production, many rails applications still use Apache
as a front-end, and dynamic Ruby on Rails requests get forwarded to a
Rails Server (Mongrel or Lighttpd). However, since we are actually
pushing out pure html code when we do caching, we can tell Apache to
check to see if the page being requested exists in static .html form.
If it does, we can load the requested page without even touching our
Ruby on Rails server!
Our httpd.conf might look like this:
12345
6789101112131415161718
19
<VirtualHost
*:80> ... # Configure mongrel_cluster <Proxy balancer://blog_cluster>
BalancerMember
http://127.0.0.1:8030 <
/Proxy> RewriteEngine On # Rewrite index to check for static
RewriteRule ^/$ /index.html [
QSA] # Rewrite to check for Rails cached page RewriteRule ^([^.]+)$
$1.html [QSA] # Redirect all non-static requests to cluster
RewriteCond
%{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://blog_cluster%{REQUEST_URI} [
P,QSA,L] ...</VirtualHost>
In lighttpd you might have:
123
server.modules = (
"mod_rewrite", ... )url.rewrite += ( "^/$
" => "/index.html" )url.rewrite += ( "
^([^.]+)$" => "$1.html" )
The proxy servers will then look for cached files in your /public
directory. However, you may want to change the caching directory to
keep things more separated. You'll see why shortly.
Moving your Page Cache
First you'd want to add the following to your /config/environment.rb:
config.action_controller.page_cache_directory =
RAILS_ROOT + "/public/cache/"
This tells Rails to publish all your cached files in the /public/cache
directory. You would then want to change your Rewrite rules in your
httpd.conf to be:
12345
# Rewrite index to check for static
RewriteRule ^/$ cache/index.html [QSA]
# Rewrite to check for Rails cached page RewriteRule ^([^.]+)$
cache/
$1.html [QSA]
Clearing out a partial/whole cache
When you start implementing page caching, you may find that when you
add/edit/remove one model, almost all of your cached pages need to be
expired. This could be the case if, for instance, all of your website
pages had a list which showed the 10 most recent blog posts.
One alternative would be to just delete all your cached files. In
order to do this you'll first need to move your cache directory (as
shown above). Then you might create a sweeper like this:
12345
6789101112131415161718
19
class BlogSweeper
< ActionController::Caching::Sweeper observe Post def
after_save(record) self.class::sweep end def
after_destroy(record)
self.class::sweep end def self.sweep cache_dir =
ActionController::Base.page_cache_directory unless cache_dir ==
RAILS_ROOT+"
/public" FileUtils.rm_r(Dir.glob(cache_dir+"/*
")) rescue Errno::ENOENT RAILS_DEFAULT_LOGGER.info(
"Cache directory '#{cache_dir}' fully sweeped."
) end endend
That FileUtils.rm_r simply deletes all the files in the cache, which
is really all the expire_cache line does anyways. You could also do a
partial cache purge by only deleting a cache subdirectory. If I just
wanted to remove all the caches under /public/blog I could do:
12
cache_dir = ActionController
::Base.page_cache_directory FileUtils.rm_r(Dir.glob(cache_dir+"/
blog/*
")) rescue Errno::ENOENT
If calling these File Utilities feels too hackerish for you, Charlie
Bowman wrote up the broomstick plugin which allows you to
"expire_each_page" of a controller or action, with one simple call.
Needing something more advanced?
Page caching can get very complex with large websites. Here are a few
notable advanced solutions:
Rick Olson (aka Technoweenie) wrote up a Referenced Page Caching
Plugin which uses a database table to keep track of cached pages.
Check out the Readme for examples.
Max Dunn wrote a great article on Advanced Page Caching where he shows
you how he dealt with wiki pages using cookies to dynamically change
cached pages based on user roles.
Lastly, there doesn't seem to be any good way to page cache xml files,
as far as I've seen. Mike Zornek wrote about his problems and figured
out one way to do it. Manoel Lemos figured out a way to do it using
action caching. We'll cover action caching in the next tutorial.
How do I test my page caching?
There is no built in way to do this in rails. Luckily Damien Merenne
created a swank plugin for page cache testing. Check it out!
Conclusions
Page caching should be used if at all possible in your project,
because of the awesome speeds it can provide. However, if you have a
website with a member system where authentication is needed
throughout, then you might not be able to do much with it outside of a
login and new member form.
Ready learn about the other Rails Caching methods, continue to Part 2
of the tutorial.