Scaling a website up

27 views
Skip to first unread message

ichudov

unread,
Aug 31, 2011, 10:48:59 PM8/31/11
to perlbal
I have a website project algebra.com. Right now it gets about 61,000
visits per day, and in peak times, serves about 55 requests per second
(requests for HTTP objects, not pageviews).

This is all good and well. When this goes on, it uses about 30% of CPU
on my server, which makes me think about "scaling up".

I see the following stages of such scaling up:

Stage 0) Buy a bigger server. This can give me a 3x capacity.

Stage 1) Moving some functions not requiring a central database, such
as dynamic image generation of formula images, and generally image
serving, to another server. This can, roughly, double the capacity.

Stage 2) Having several servers serving the same content, managed by
perlbal, but all hitting one MySQL server for data reads and writes.
This can, roughly, give me 10x capacity.

Stage 3) Having many servers serving content, replication for having
many copies of MySQL servers serving read access, and one central
server for write requests. This can, roughly, give me 50-100X
capacity.

Am I missing anything?

thanks!

Brad Fitzpatrick

unread,
Aug 31, 2011, 10:52:24 PM8/31/11
to per...@googlegroups.com
Stage 4) shard.

If you have 1 master and n slaves, that doesn't give you n times more read capacity, because each n is doing all the writes for the whole world, and eventually all 1+n of your databases will spend all their time only doing writes.

Eventually you need to split your database up into many small databases.  Then you can have still have a master, but all it contains is an index for which users/content is on which shard number(s).

ichudov

unread,
Aug 31, 2011, 11:09:29 PM8/31/11
to perlbal
Brad, thanks a lot.

Regarding sharding, this very thought gives me shivers. I recall
reading those snickering articles about "fate worse than death" of a
certain social website that is trapped in sharding, etc.

I wish I had their traffic, but at the same time, I would like to
think about a more optimal approach.

Is there another solution, perhaps involving another database vendor
(not MySQL and perhaps not free software), that would permit me to
have a tremendous throughput as a single instance of a database?

If I pay sixteen bazillions of dollars, can, say, Oracle provide me
some solution?

What does eBay do?

thanks

i

Brad Fitzpatrick

unread,
Aug 31, 2011, 11:12:49 PM8/31/11
to per...@googlegroups.com

If your database isn't already running on a high-end SSD, do that first. I never had that option.

MySQL is fine. It's your schema that matters, not your software usually.

ichudov

unread,
Aug 31, 2011, 11:15:37 PM8/31/11
to perlbal
Brad, yes, I am already running algebra.com's MySQL database on an
Intel SSD.

To say that it is AWESOME would be a huge understatement.

Thanks about pointing out the schema issue. It is a very interesting
angle of approach.

i

Ask Bjørn Hansen

unread,
Aug 31, 2011, 11:42:32 PM8/31/11
to per...@googlegroups.com

On Aug 31, 2011, at 19:48, ichudov wrote:

> I have a website project algebra.com. Right now it gets about 61,000
> visits per day, and in peak times, serves about 55 requests per second
> (requests for HTTP objects, not pageviews).
>
> This is all good and well. When this goes on, it uses about 30% of CPU
> on my server, which makes me think about "scaling up".

What is using 30% CPU? Perlbal? That's pretty crazy for ~60 requests a second. Are you using XS header parser?

If the database: Make your queries more efficient.

If the web application: Buy more web servers (or make the code more efficient).


- ask

David Davis

unread,
Aug 31, 2011, 11:46:47 PM8/31/11
to per...@googlegroups.com

He may be referring to his entire stack is taking 30% cpu, not just Perlbal alone :)


Also, install  IO::Epoll if you haven't already.

--
 
David Davis | ☄ Software Engineer - Dev Manager | http://xant.us/

ichudov

unread,
Sep 1, 2011, 6:24:05 AM9/1/11
to perlbal
Yes, I am referring to everything -- perlbal and apache -- taking 30%
of CPU.

i

ichudov

unread,
Sep 1, 2011, 6:24:40 AM9/1/11
to perlbal
Dave, would IO::Epoll automatically be used by Perlbal? What exactly
is the advantage that it gives?

thanks

On Aug 31, 10:46 pm, David Davis <david.da...@gmail.com> wrote:
> He may be referring to his entire stack is taking 30% cpu, not just Perlbal
> alone :)
>
> Also, install  IO::Epoll if you haven't already.
>
> --
>
> *David Davis | *☄ Software Engineer - Dev Manager |http://xant.us/

Leo Lapworth

unread,
Sep 1, 2011, 7:03:42 AM9/1/11
to per...@googlegroups.com
On 1 September 2011 03:48, ichudov <ich...@gmail.com> wrote:
I have a website project algebra.com. Right now it gets about 61,000
visits per day, and in peak times, serves about 55 requests per second
(requests for HTTP objects, not pageviews).

This is all good and well. When this goes on, it uses about 30% of CPU
on my server, which makes me think about "scaling up".

If I'm reading this right at the moment you run everything (perlbal/apache/mysql) on one server?

If so you can split it out quickly along these lines..

1) perlbal on one server (or two with fail over HA setup)

2) apache on one or more servers (better yet have a look at starman+Plack if your app is in Perl), perlbal can then load-balance across these, adding more as you need them. This works really well if your CPU bottleneck is in resizing images etc rather than in the database. You might want to consider memcached for caching database queries across multiple machines

3) have your database on another server

You don't have to do this split all at once, but it might be where you want to get to.

How you do the split out really depends on what's creating the load

Hope that helps

Leo

ichudov

unread,
Sep 1, 2011, 8:59:47 AM9/1/11
to perlbal


On Sep 1, 6:03 am, Leo Lapworth <l...@cuckoo.org> wrote:
> On 1 September 2011 03:48, ichudov <ichu...@gmail.com> wrote:
>
> > I have a website project algebra.com. Right now it gets about 61,000
> > visits per day, and in peak times, serves about 55 requests per second
> > (requests for HTTP objects, not pageviews).
>
> > This is all good and well. When this goes on, it uses about 30% of CPU
> > on my server, which makes me think about "scaling up".
>
> If I'm reading this right at the moment you run everything
> (perlbal/apache/mysql) on one server?
>
> If so you can split it out quickly along these lines..
>
> 1) perlbal on one server (or two with fail over HA setup)
>
> 2) apache on one or more servers (better yet have a look at starman+Plack if
> your app is in Perl), perlbal can then load-balance across these, adding
> more as you need them. This works really well if your CPU bottleneck is in
> resizing images etc rather than in the database. You might want to consider
> memcached for caching database queries across multiple machines

I standardized on Apache plus mod_perl a long time ago, and I am
generally happy about it. Additionally, I have plenty of other Apache
based websites.

> 3) have your database on another server

Yep. Good point.

I also have two tables, one for cookies/sessions, and one for "binary
cache", which is a cache for various precomputed objects. I would
separate those out into separate databases also. They are not in a
relational relation with any other tables.



> You don't have to do this split all at once, but it might be where you want
> to get to.
>
> How you do the split out really depends on what's creating the load

I agree. All good points.

i

ichudov

unread,
Sep 1, 2011, 10:13:45 AM9/1/11
to perlbal
I realized that I have a basic question. Is perlbal an inherently
single threaded application?

Can I somehow take advantage of having many CPUs with perlbal?

Thanks

i

dormando

unread,
Sep 1, 2011, 12:16:06 PM9/1/11
to perlbal
> Brad, thanks a lot.
>
> Regarding sharding, this very thought gives me shivers. I recall
> reading those snickering articles about "fate worse than death" of a
> certain social website that is trapped in sharding, etc.
>

(I seem to be not getting 1/4th of this thread? wtf google groups).

Take note that the "fate worse than death" article was an opinion piece by
a vendor attempting to sell something. I've seen people fear simple
sharding so much that they end up implementing it wrong when the time
comes.

SSD's and strategic magnetics will get you a *long* way though.

dormando

unread,
Sep 1, 2011, 12:18:20 PM9/1/11
to perlbal
> I realized that I have a basic question. Is perlbal an inherently
> single threaded application?
>
> Can I somehow take advantage of having many CPUs with perlbal?

Perlbal is single process, but you can run many perlbal's. LVS or Haproxy
for "Stupid/fast" L4 LB, to a set of perlbals which then do more
intelligent things.

You can also serve static content via another hostname/IP combo which
directly talk to nginx or similar.

Make sure you have Perlbal::XS::HTTPHeaders installed, and `perldoc
Perlbal::Manual` has some sections on optimisation. Enabling backend
keepalives, client keepalives, etc, can drastically lower perlbal's CPU
usage.

David Davis

unread,
Sep 1, 2011, 4:53:17 PM9/1/11
to per...@googlegroups.com
Yes.

Danga::Socket (used in Perlbal) will use IO::Epoll if installed.
It falls back to a select based poll in the case that it isn't.

You will see the benefits when you have a large number of connections.

--
 
David Davis | ☄ Software Engineer - Dev Manager | http://xant.us/

Reply all
Reply to author
Forward
0 new messages