This is in response to comments in the future of happstack thread, but
I hope that it will generate enough discussion that it will deserve
its own thread.
I have seen a number of people (in that thread and other places)
express the feeling that happstack's persistence layer is fine for
small applications, but they are concerned that it does not scale to
midsize applications, and that they prefer to use something like
My counter-argument is that these people are not looking far enough
down the line of scalability, and that in fact, happstack is perhaps
the only technology that has the potential to smoothly scale from very
small to very large. If you look at sites like Google, Amazon, and
Facebook, you will very clearly see that MySQL/SQL does not scale.
To really understand what I am talking about, I highly recommend
watching this presentation on the facebook architecture:
You will learn some very interesting things about how facebook works:
- their mysql tables do *not* used ACID transactions (too slow)
- their sql tables only have two columns. They are just key/value
- they do *not* do SQL joins -- too slow. Instead they return the
rows to PHP and do the 'joins' in PHP. Obviously, this takes more CPU
power (because there is no way that PHP is faster than MySQL). But it
turns out that adding even more databases servers to the pool is
really hard to scale. But adding more PHP frontend servers is easy and
- in order to get their queries fast enough they have 800 memcached
servers with 28 terabytes of RAM to cache SQL query results. (As of
December 2008. They are looking to buy, or recently bought 50,000 new
servers, so they probably have even more now).
- despite all those memcached and sql servers, they still have to
have custom code which archives data out of their MySQL databases into
a different long-term storage medium (with longer access times) to
keep their MySQL databases running fast enough.
- in order to make their image serving platform (haystack) fast
enough, they don't even use a normal filesystem. They store all the
meta-data for each image in RAM. The meta-data includes the block
number of the image on the disk and the length. This way serving an
image only requires one disk seek instead of two.
It's pretty clear that facebook is not using MySQL as a transactional,
relational database. They are basically just using it as a thin
wrapper over berkeley DB (which is what provides the b-tree disk
storage used by MyISAM). I suspect they would prefer to use BerkeleyDB
directly instead of MySQL except that the cost of doing the switch is
So what can happstack take away from this?
First, to have a very large responsive site, you have to do away with
disk-access as much as possible. I forget the statistic but I think
they have a memcached hit rate of > 90%. (ie, >90% of the time, they
can get the data from memcached instead of having to hit the database
Second, even with a sophisticated database, a general purpose disk
storage algorithm is not enough. You ultimately need a system that is
customized for your app and knows the data access patterns and when to
move things to slower/faster access locations.
Third, ACID transactions really don't scale (Amazon does not use them
either, and Google's map-reduce does not have them either as far as I
Fourth, a one-size fits all solution might scale to a medium sized
solution, but won't scale to really big. Hence, a good solution is
likely to be very modular and very extensible so that it can be
customized to the specific needs of the site.
If you look at the facebook architecture, it is seems like they are
doing things fairly backwards. Their desire is to have all the
relevant data in RAM, but their persistent storage layer (aka, MySQL)
is inherently based on disk storage. In order to get what they want
they have to have a fancy caching layer, and do a lot of tweaking and
patching to mechached and mysql to try to get the right things in the
cache. Furthermore, they have to prune things from the SQL database to
keep it responsive enough.
It seems like what they really want is a system which starts with all
the data in RAM (where they want it) and gives them explicit control
over taking data out of RAM and storing it on disk based on the usage
patterns and metrics that are specific to their site. (And, when
storing to disk, they would probably forgo having a normal filesystem,
instead just storing block numbers in the RAM based persistence
So, I would argue that happstack-state is, in fact, the right
approach. It works for small sites, and it is what very large sites
end up implementing in an adhoc way. If you have a site which you hope
will grow from small to very large, you would ideally want to use a
technology which can scale from small to very large -- because you are
not going to have time to do a rewrite once your site starts to take
off (something they talk about in the presentation a fair bit).
The problem with happstack-state is not that it is the wrong approach,
but that it is very incomplete. It does not yet provide all the pieces
that you need to build your applications scalable persistence layer.
For example, I think it is sensible to expect that many sites (even
ones with 28TB of RAM) will need to store things on disk. So, clearly
there needs to be a sensible way to do that using happstack-state.
Some people have suggested that happstack-state should provide this
functionality automatically. I disagree slightly. I think that
happstack-state should provide the hooks so that it can be done. *and*
it should provide a default implementation/policy which uses those
hooks so that you can use it in your app. But later, you should be
able to replace that mechanism with a custom one that is fine tuned to
your specific site. Forcing the use of a specific policy won't scale.
Instead people will be forced to build some adhoc mechanism on top of
I should note that Alex and company were reading the reports from very
large sites like Amazon when they designed happstack-state. So it is
not luck that happstack-state seems setup to scale really big. It has
always been an essential part of the plan.
I will happily agree though, that if you expect your site to be medium
sized in the next couple months -- MySQL does look pretty attractive
because it is here today :)