Scale is one of those weasel words of our industry because it means
different things to different people. I'm presently on a project where
what Joomla does will have relatively little impact upon scalability
and we're looking towards pre-computing and caching huge amounts of
data to get it to return data in a reasonable period of time. Number
of users in this system: 5000. That's not concurrent users, that's our
total. We're not putting this on slouches of servers either, from
memory our three DB servers are 72GB Oracle T4 boxes.
A while ago I went to a presentation at a conference from a Google
engineer. It was about how hard it would be to put a real time hit
counter on the Google home page. It went through doing your most basic
implementation and then working through adding in load, forcing to
scale out, latency of having to ask multiple servers for responses as
you scale up and eventually you shift incrementing the counter to a
backend process that instead reads the log files, pushes updates to
the cluster, but then that gets beyond the limits so you have many of
those so when they're done counting a set, they ask the max counter
value, adds theirs to it and push it back out to everything and then
on the front end side you cheat and increment the counter artificially
using a crude algorithm (n hits per second, cache is x seconds old) or
an internal cache that fakes incrementing with session affinity on the
load balancers to help maintain the illusion. You can see a piece of
this with YouTube where the counter doesn't get incremented in real
time. Something as simple as a hit counter at "scale" is far from
simple.
I disagree that it is the CMS' job to handle infrastructure tasks.
It's the job of the monitoring system to keep a check on performance
and be reactive to load. Being able to 'popup' extra nodes in a
cluster is not a trivial job for anyone who has written bare metal
deploy scripts or built out template VM images (more recently
particularly with VMWare, AWS and to an extent Azure; really x86
virtualisation that doesn't suck). Instead a proper management system
should be used which handles provisioning these environments and
monitoring load. The CMS shouldn't need to be aware that you've added
extra web nodes to that cluster - it should just handle the requests.
The CMS shouldn't have an awareness of the F5 in front of it or need
to know how to configure it. It shouldn't need to know that one of the
backend database servers fell over and it needs to provision a new
one, there a plenty of HA solutions that handle that. It should just
continue to connect to the VIP and let the other layer handle itself
while handling an absolute failure as gracefully as it can. Similar
deal for the caching layer, it should handle failure gracefully which
may mean your DB layer gets more load than it was expecting (vice
versa caching layer may be able to keep up parts of the site as well).
Only with recent virtualisation advances are we able to spot deploy
environments and even then if you don't have the underlying hardware
to support it then you're stuck anyway (again, less of an issue for
"cloud" VM solutions). However if your CMS/app level is aware of all
of that then you've done something wrong - or you're trying to avoid
re-using the myriad of tools that will help you out. In fact I'd argue
that if the CMS is doing all of that from a click of a button or two
in it's UI - that's a recipe for much pain.
If an unmodified Joomla instance will handle all of that with the
level of traffic that you're asking requires a deeper understanding of
exactly what you're doing, what caching options you can utilise and
the level of personalisation that you are expecting to deliver. My
personal suggestion is that you take the time to build up a system,
import your base data and then replay traffic from your live site back
onto the Joomla site and see how it goes. Capture traffic streams and
map them onto Joomla to see what performance would be like for flows
of your sample users, look at how many concurrent users you have and
then ramp up the Joomla site with the traffic streams to emulate it.
Also depends on how much hardware you can put out there and how
skilled you are at tuning the low level stuff. I've run into strange
low level bugs[1] of various systems even with some of the simpler
stuff I've worked on. Suffice to say that Joomla is a piece of the
puzzle, perhaps the largest piece, but not the only piece. And likely
you're going to need some tuning, though I don't understand why none
of these fixed SQL queries have made it back as patches.
Cheers,
Sam Moffatt
http://pasamio.id.au
[1]
http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
>>>
joomla-de...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>>
joomla-dev-gene...@googlegroups.com.
>>
joomla-de...@googlegroups.com.
>> To unsubscribe from this group, send email to
>>
joomla-dev-gene...@googlegroups.com.
> You received this message because you are subscribed to the Google Groups
> "Joomla! General Development" group.
> To post to this group, send an email to
joomla-de...@googlegroups.com.
> To unsubscribe from this group, send email to
>
joomla-dev-gene...@googlegroups.com.