Hi Klemen,
I don't think there is an ultimate solution for everyone, but one piece of advice is: keep it as simple as possible. A lot of it depends on how much traffic you have, and if you can actually cache your content (and how aggresively).
Regarding Varnish (and other transparent caches) there is some rudimentary support for a cache in front of a SilverStripe install via HTTP::set_cache_age() which might help alleviating your load. By default all responses come with cache-murdering headers, but if you set this to >0 it will add appropriate Cache-Control and Vary headers that Varnish should pick up. Otherwise it will only cache static files (assets).
One thing to be aware of is that you'll need to rewrite the User-Agent headers in Varnish to sort all the crazy User-Agent strings into couple of distinct classes, or remove them. Otherwise your Varnish will try to generate a cached version for every User-Agent in the world. If you are using responsive site and don't use agent-sensing in your PHP code you could also remove Vary: User-Agent completely (but there is no switch for that in core... ugh).
If you are intending to load-balance using varnish, you will need some kind of sticky sessions to still allow logins (otherwise it will round-robin between backends and you will have to log in on every request :-D ). I have something that does it, let me know and I can extract it for you.
For high-traffic sites you can use
https://github.com/silverstripe-labs/silverstripe-staticpublishqueue/ to generate a html / php dump of the site, and then serve static html files. If the file is not available, it can fall back to dynamic PHP request (or you could block it). This is at the moment the only solution I know here that handles cache invalidation: if you publish, the html file gets updated. I'm not sure if there is anything that would make SS ping a cache (redis? varnish?) once new content is available. I'd be keen to know if there is anything :-)
So here are the things I'd probably try, in order of amount of work necessary:
- partial caching (<% cached %>) and other optimisations (i.e. find slowest URLs and fix these in code)
- if you have varnish in front, set HTTP::set_cache_age() to non-zero (remember about User-Agent Vary header - perhaps just pop it off in your VCL config!)
- add more resources to your server: e.g. if you are getting 1 interactive req per second regularly, maybe it makes sense to add second core and more RAM first.
- try staticpublishqueue module (this is easier than it seems, although you have to think up front what are the relationships between pages - i.e. what node updates what, what is publishable). This helps massively, unless you have a lot of interactive requests that are un-cacheable (e.g. form submissions)
- try Varnish load balancing and add more machines - if your load is on the Apache, not in the DB - as long as you keep one DB it's not going to be very hard (apart from the sticky session code where I can help you out).
- write your own caching system - if you plan aggresive caching (e.g. for hours) this would probably require writing some code to invalidate the cache when the content changes, and also some way of defining dependencies (staticpublisherqueue already has that: e.g. if a page changes, you might also want to update some menus on other nodes)
And then at the faaar end:
- horizontal scaling that includes the DB - pretty much any kind of partitioning or clustering where you have multiple DBs. It would probably help to get familiar with the CAP theorem. This blogger tested a lot of distributed systems and found out that with the default settings you start loosing quite a bit of data, maybe worth skimming too:
http://aphyr.com/tags/jepsen .
m