How much does it really cost?

Eran Sandler

unread,

Jan 5, 2011, 11:27:13 AM1/5/11

to iltec...@googlegroups.com

In the last lecturers tech talk that we had at Outbrain (thanks for hosting guys!) I talked with Ori about some of the cloud vs DC stuff (yes, again :-) ) and he mentioned that a strong DB machine costs about $5,500 to get.

The configuration is:

- 8 core Nehalehm based CPU (either with or without hyperthreading)

- 32Gb RAM

- RAID6 (I assume its 4 drives or so of 1TB)

I set out to compare that single machine (at least for now) to a comparable machine on AWS.

To normalize things to AWS terms I've computed everything on a cost/hour (or cost/mo in certain cases) so that I can come up with something comparable.

There is no exact configuration in AWS for this machine. There are 2 that comes close:

- High Memory Double Extra Large (4 cores, 34.2GB RAM, 850Gb local storage)

- High Memory Quadruple Extra Large (8 cores, 68.4GB RAM, 1.7TB local storage)

I'm missing a couple of numbers to complete this but this is what I have thus far:

Cost of the machine per hour over 3 years:

DB Machine DC: $5500 / 36 months / 30 days / 24 hours = $0.21/hour

DB Machine AWS (double extra large): ($4000 3yr reservation / 36 months / 30 days / 24 hours) + $0.34 = $0.49/hr

DB Machine AWS (quadruple extra large): ($8000 3yr reservation / 36 months / 30 days / 24 hours) + $0.34 = $0.98/hr

Before you get too excited :-), these are the raw numbers. What I'm missing on top of that in the DC area is this:

- Power cost / hour (depends on the amount of watts your power supplies take)

- Cost of CoLo (space you take - I know a 40U rack, at least on the net, costs ~$1000/mo and these database beasts are usually 4U, if I'm not mistaken)

- Cost of fixed static hardware (Firewalls, network switches, UPS, cabling all of the network, etc) normalized per hour

- Cost of the salary of an ops guy (or, if you are using managed hosting, the cost of the managed hosting)

On top of that we need to add the cost of bandwidth, so I need:

- Cost of bandwidth normalized to per GB per month (so we can compare AWS. Although I'm quite sure this is probably going to be the weaker points of AWS)

One I get more data from you on some of these numbers I'll put it all in a nice spreadsheet so we can talk about it and see.

Of course, we shouldn't forget that there are other pros and cons related to the network topology and hardware configuration which are different from a DC standpoint compared to AWS.

Get those numbers flowing!

If you don't like to share some of your numbers publicly, feel free to contact me directly and I'll update it anonymously.

Thanks,

Eran

Ori Lahav

unread,

Jan 5, 2011, 3:44:03 PM1/5/11

to iltec...@googlegroups.com

Eran

you are probably measuring how much time it will take me to answer that - right? :)

with my speed - I could work for AOL answers? :)

BTW, as this is part of my ILTechTalks presentation I checked and rechecked the numbers as when I did the spreadsheet for this it was hard for me to believe these are the differences.

anyway, Let me fill in some of the numbers and correct some of your assumptions.

Outbrain's new machines are the new Dell C series which are 2U devices with 4 machines in them - kind of mini blade.

in each of these machines you can fit in whatever hardware configuration that fit's your needs.

So... first correction - our DB machines take also 1/2 U and not 4 as you mentioned. (Theoretically I can put 80 of them in a rack if I have the power)

as we have many MySql Slaves (again the size does matter here) we don't really care if one slave dies (not that it ever happened) load balances will rout the traffic to other slaves so we configure the slaves on raid 0 (stripe) and gain both speed and space with lower cost of hardware.

You can assume that by calculating the yearly rate of such machine I added the power cost (according to what I measured in our day to day usage) the cost of the network set-up (it's port in the switch - or actually 2 ports as we have network redundancy as well).

If you remember that over time the 'one time' costs are much much lower then monthly recurring costs. for example spending ~$10-20K on DC buildup project once a year is marginal comparing for the recurring costs (colocation/power/network) and the actual hardware costs spread over 3 years.

So... on the marginal costs I put:

- Buildup projects.

- The day to day care of the hardware (AKA the SysAdmin myth - I will explain later)

- Load Balancers (we use Linux based Ldirector and HAProxy on machines that cost ~$1000 for all our traffic).

(I'm sure NetFlix spent a lot more then this to 're-invent' the IP stack in order to fit their applications to the cloud's lousy network. )

About the SysAdmin Myth - I measured that too (as much as it's possible) - to handle ~200 servers we had 2 Ops engineers. one of them was handling the datacenter set-up. I asked him... "on your day to day work" (not when we are in build-up project. how much of your time do you spend on the actual hardware?" the answer was 25%. which is 12.5% of the time the whole Ops team spend on that.

So... whether you put your set-up on EC2 or on your own Colo you still need to configure/install the infrastructure applications, monitor the beast. configure load balancing, backup stuff, make sure backups can recover. clean logs, etc......................... all the stuff that sys admins does. If you think that you will not need sysAdmins, then your developers will need to do all that stuff. which is nice as philosophy but it's still the same cost.

bottom line - In my presentation I demonstrate the math behind it on scale like outbrain's scale (which is not small but defiantly not as big as the Internet Giants).

comparing to EC2 we saved nice few hundreds of $ a year which we could afford spending on R&D or BD people that push the business forward.

In our business, Serving is a cost center. if the business/R&D part is measured by RPM (revenue per 1000 hits) and need to bring it up, we are measured by CPM (Cost by 1000 hits) and working to bring it down. In big scale, if you pay attention to it - you save tons of $$$ and increase your margins.

Anyway, my presentation covers all this so feel free to ask for it if you are interested.

--
http://olahav.typepad.com

Eran Sandler

unread,

Jan 5, 2011, 6:05:27 PM1/5/11

to iltec...@googlegroups.com

Dude, talk to me with numbers, either by replying or sending me the presentation (or better yet, putting the presentation on slideshare for all to see!)

I do have some additional questions and assumptions that I'm not sure are covered in your presentation:

If the ops work is 12.5% of the Ops team time than its 12.5% of the salary - so we should add that.
If its 1/2 U than if the original assumption of $1,000/mo/ 40U is correct its - ~$0.17/hour for the physical hosting
I didn't include the fact the basic monitoring is covered by default without any configuration or storing needs by AWS CloudWatch (a 5min interval). At AOL Answers we were lucky to use a shared Nagios installation (but installing and configuring Nagios takes loads of time. It's quite a pain). We also needed a dedicated machine running Cacti to collect, store and plot our basic and custom metrics, so setting that up for basic stuff also takes more machines and setup time, not to mention taking care of.
You didn't give the rough cost of the basic hardware that is needed like a firewall for the whole site (which you get for free) and network switches
Regarding load balancers - I assume you are using some high availability configuration (probably linux-ha)? That takes a bit more time to configure (which makes the setup costs a bit higher). For the basic front facing load balancer, including SSL offloading, health checks and an API to add and remove servers - all this for ~$18/mo excluding traffic (which you pay anyway) with the ability to offload to a different Availability Zone in the same region.

I appreciate you taking the time to give the long answer :-)

Eran

Eran Sandler

unread,

Jan 6, 2011, 1:02:43 AM1/6/11

to ni...@mailnicks.org, Ori Lahav, iltec...@googlegroups.com

I see don't have the complete numbers, but by the raw $1.15 vs $0.5-$0.98 doesn't seem 2-5 times does it?

The more expensive EC2 machine I gave as an example have 68.4GB of RAM which should make the machine cost significantly more than $5,500, which means the $1.15 number you gave me should be higher.

That's why I want to get a clearer picture. Sometimes a company can chose to spend 2-5 times more just to be able to handle bursts (if that's their application profile), so I really would like to know once and for all the actual costs of a DC while leasing hardware and/or managed services vs DC where you buy the hardware and provide the rest of the services.

I also did not include the fact that you need hardware replacements if you are using a DC where you buy hardware which means you need to also have some stock in place (or somewhere really quick to acquire the hardware).

btw, you did notice we are off the list now. Is it on purpose? I'm assuming not, so I'm putting it back to the list.

Eran

On Thu, Jan 6, 2011 at 7:54 AM, Nir Yeffet <ni...@mailnicks.org> wrote:

true, that's why you see the diff - 7 times vs 2 - 5 times. point remains, ec2's more expensive when you need to scale. it is break even before full blown infrastructure(network, etc) is needed; when you are small, this is exactly where you save; however, it is dangerous, if you marry to it and then need to scale.
On Wed, Jan 5, 2011 at 9:21 PM, Ori Lahav <ola...@outbrain.com> wrote:
Nir 
24/7. NOC is a good example for something that is considered as serving costs but its not relevant for this discussion as you have to do it regardless if its EC2 or not.
Noc is a significant cost if you choose to do it.
BTW We chose not to - just because we are distributed over 2 continents.

--- original message ---
From: "Nir Yeffet" <ni...@mailnicks.org>
Subject: Re: How much does it really cost?
Date: 6th January 2011
Time: 2:12:25 am
FYI, Mine to you did included also everything you mentioned: 24x7 NOC and monitoring, firewalls, load balancers, switches, cables, ssl and way more. Basically all of operations (no product development) is included: I took the bottom line number.

In our calculations, depend on the application, EC2 came up between 2 to 5 times more expensive. However, sometimes we do use EC2 for burst and quick capacity needs. it is good exactly for that.

Ori Lahav

unread,

Jan 6, 2011, 2:20:45 AM1/6/11

to iltec...@googlegroups.com

Eran

I need to clean my presentations from outbrain detailed business facts and put it in ILTechTalks site. promise to do it soon.

but here are some of the calculations I've done.

such DB server cost me 2500$ a year including hosting/power/network where the closest EC2 instance costs $17500 a year on the hourly costs and $11000 on the reserved scheme (which takes the essence of running on cloud). do the math how much is it on 20 DB servers.

Let me answer your question:

1. Let's roughly assume (for the ease of calculation) that Ops engineer cost for employer is ~$100K/year which is $25K only for handling the hardware sounds much? not when you divide it by 200 servers then it's 125$ per server per year. which is $0.014 per hour. less then your coffee budget - i.e. marginal.

2. the 1/2 U is in my calculation of the yearly server cost.

3. Monitoring - to do the basic monitoring Cloud watch is offering for free - you can use the default modules of Nagios which are pretty quick to set-up. (marginal) but we both know that for a real and significant monitoring you want to watch things like Java heap space, user memory vs system vs filesystem cache. you want to monitor your tomcat response time, time it take you to index documents, gaps in queues, memcache hit-rate, etc.... you get the point. there is no way around these then to have a monitoring system and invest in it. again, same investment in the cloud or in colocation.

4. we are using Linux ha between 2 load balancers (HAProxy) - it's hard to quantify how much time we invest in it but if you take out the one time set-up (maybe a day work) we put the same configuration time as you put to configure the AWS load balancer. adding a server to config is the same in both.

Regarding the thread that developed below with Nir.

I think the main message is that there is no Clear cut. for some applications at some sizes AWS will be more cost effective for some it's 2 times more expensive and for some it's 5 times or 7 times more expensive. There is no way around this - you will have to do this math for your system - no one can give you the numbers. the numbers change significantly when you grow so you need to put this also into your considerations.

what I recommend to everybody is to do the datacenter cost exercise every once in a while as they grow to see if they are crossing the point when it makes sense for them to shift to DC.

I know some people that can help you on doing this math according to your system.

If you want to do a lighter exercise just to get the feeling - let's talk.

Ori

--
http://olahav.typepad.com

Eran Sandler

unread,

Jan 6, 2011, 2:47:07 AM1/6/11

to iltec...@googlegroups.com

Thanks for the data Ori!

This means its ~$7,500 for 3 year or ~$0.28/hr.

This also means that if such machine costs ~$5,500 for 3yrs the overhead on top of buying the machine is ~$2,000 for 3yrs.

The numbers you should are when you have a 200 servers configuration. I assume (and we can try and reach out to an AWS sales rep) that for 200 DB instances such as these you won't pay the same as if you are running one.

How much would 1 cost you? I'm assuming more than $2500/yr.

The whole point of this exercise is to see where the sweet spot is and where the curve changes.

Eran

Ori Lahav

unread,

Jan 6, 2011, 7:33:10 AM1/6/11

to iltec...@googlegroups.com

In general - you are correct.

the graph of cost vs growth act differently in the 2 set-ups.

in AWS its linear the more you grow - the more you pay - linearly.

in colocation it's sub linear the more you grow the less you pay per growth unit. the opposite from MAS HACHNASA :)

you are right - for one server the yearly calculation is $6000 (or more) for machine cost (machine is more expensive when you buy 1 and not 20). which is $2000 a year . add to it $500 1/2 cabinet cost * 12 = $6000. add to this 2 switches of $1000 each (on small set-up it worth using low end switches) = 2000/3=$660 and you Let's say power will cost you additional $200 a month per server which is $2400 a year and you get:

2000+6000+660+2400 = $11060 a year for a single server if you run it solely.

but when you add the 1 time set-up costs for this 1 server - it may come to say $17K

when I do this math now - I'm surprised that we break even with AWS on 1 server - then obviously for hundreds it's much much more effective

as for referring to AWS rep - It's free market - there are also reps for colocation vendors - network gear vendors hardware vendors etc... when you are big you have the effect on all of them (those of you who met me in person - knows what i'm talking about :)).

if you are cost sensitive (everybody should be) you should do this math for yourself and use prices from people that use colocation.

Ori

--
http://olahav.typepad.com

Lior Sion

unread,

Jan 6, 2011, 10:34:39 AM1/6/11

to iltec...@googlegroups.com

Just a thought: how about creating a simple excel that has all the expenses ready made for both possibilities, and just fill in the blanks? Not only it would be much easier for everyone to follow, we can later add different types of clouds into the mix and see the difference.

--
thanks,
Lior Sion

Skype: sionlior | GTalk: lior.sion

Zivo

unread,

Jan 9, 2011, 2:25:36 AM1/9/11

to ILTechTalks

Ori,
I'll be happy to see you presentation, but I suspect you are
missing few asspects of cloud vs. DC.
1- Opex Vs. Capex.
2- Felxability/ burst / pick.
3- Attention of the your managemnt team.
4- Down grade (we are not just grow:-().
5- Costs of spares and /or 4 hours/ next business day HW replacment.
6- Cloud / AWS is NOT Architecture transparent! Part of cloud
efficiency trick is to buils your architecture for the cloud. Try to
stick to the simple building blocks and avoid the 64GIG buils ind RAID
monsters.
7- With all due respect to Linux, your load balancing solution, will
not survive the first attack, with AWS I don't have these concerns.

Last but not least - cost of mistakes.
I'm managing COLO / DC / infrastructure for the last 15 years!!! I
can't recall a start-up / compny who never (at some point of the life
cycle), made over esstimtion and bought an extra or wrong HW.

Cut a long story short -
For startup / fast growing / changing direction / unknown
situations /........no doubt- AWS:-)
If you can justify you own $300K+ HW costs- start thinking on your own
servers.

Ori,
You should calculate, not just the theory numbers, try to sum
all the REAL numbers, the real expenses you had at that domain during
the years (includs the misstakes...) (Did you ever fly to see /visit
the DC?, Did you ever bought the wrong HW? Plans for "not comming"
marketing campaign?)

Ziv.

> > *DB Machine DC*: $5500 / 36 months / 30 days / 24 hours = *$0.21/hour*
> > *DB Machine AWS (double extra large)*: ($4000 3yr reservation / 36 months
> > / 30 days / 24 hours) + $0.34 = *$0.49/hr*
> > *DB Machine AWS (quadruple extra large)*: ($8000 3yr reservation / 36
> > months / 30 days / 24 hours) + $0.34 = *$0.98/hr*

> --http://olahav.typepad.com- Hide quoted text -
>
> - Show quoted text -

Ori Lahav

unread,

Jan 9, 2011, 5:16:28 AM1/9/11

to iltec...@googlegroups.com

Hi Zivo - long time...

Let me answer your questions one by one.

1. Opex vs Capex. - what is so wrong about holding CAPEX if it brings value to the company. why are you buying computers for programmers or chairs for them? - just because it brings value! now the name of the game is how cheap you can go to create that value. My experience and measurements show that at scale - it is much cheaper to have the CAPEX (as I listed above) then have it on Opex. I'm talking about several hundreds $K which for sure justify all that.

2. Felxability/ burst / pick. - So... at some time at your growth - your bursts and changes fits nicely into the headroom that you keep anyway for DR (Disaster recovery) etc... - I agree that when you are small every little burst is very significant but when you grow up it's no longer a consideration - it swallows on the headroom you have. and yes - even by keeping the headroom - we are by far cheaper then having all that in AWS. (as I showed above)

3. Attention of the your managemnt team - I'm not sure what you mean by management but in Outbrain it's the role of the Ops team to keep management and business people out of the capacity considerations - we are talking once a month (1/2 hour) to make sure Ops are aligned with the business plans and pipeline and it's Ops task to make sure the business runs with no interruption. when people count on each other (even there is an ocean between us) it works perfectly.

4. Down grade (we are not just grow:-() - As I said above - if your business is spiky - maybe the cloud is the place for you. for instance if you are a hedge fund and need to crunch numbers once a month - go ahead to the cloud. But if you are an internet business that continueslly attract more users and bring them more value - you grow. If you ARE NOT then you are in the wrong direction and should make a pivot. If you are risk averse and always thinking of what will happen when i'll shrink - you should consider your way as entrepreneur.

5- Costs of spares and /or 4 hours/ next business day HW replacment - Again - that's irrelevant when you grow. outbrain is holding minimal amount of spares only to critical machines and we barely buy support (only to very few critical machines). most of the machines we don't take any support as when you grow - loosing a machine doesn't effect your service much - it can wait for the warranty replacement when it comes. Frankly - let me tell you something about hardware faults - It's ~30 years that vendors are producing servers. HW faults are so rare that this is really minor consideration when your system is big. Outbrain is running datacenter nearly 4 years and I can count on less then a handful the times that we needed Dell support on site.

6- Cloud / AWS is NOT Architecture transparent! - this is right but.... dangerous. if you build your architecture to the cloud it's a lifetime marriage. it becomes harder and harder to shift out. when you grow - you find out that to do the same job DC will be cheaper then cloud then you will have the costs of transferring the architecture. another point to consider.

7 - Network gear - what do you think F5 have inside the box? Linux! they just moved the software to lower levels (kernel or the chips) to get better performance. To my experience with any web technology - Open Source is advancing much faster then any vendor (this applies to all, DB/caching/search/network/etc...). one thing to take into consideration in the cloud - because you are sharing your infrastructure with others - more hackers have the motivation to attack your infrastructure then when you are alone. when they try harder - they will eventually make it.

8 - true - overestimations happen and performance improvements are also happening and you do find yourself in over capacity from time to time. in fact you always want to be in over capacity so the business can freely run forward. yet - again when the business is constantly growing - you will fill it pretty fast. that's a fact I can tell without managing DCs for 15 years!!!

I take your statement:

"For startup / fast growing / changing direction / unknown

situations /........no doubt- AWS:-)"

and just cut out the "fast growing" - if you are growing with one direction... up. and is cost aware. you will soon find out that you need many servers and cloud is just sucking your serving budget. all the revenues you make are going directly to amazon and... nothing is left to run the business. and yes - when you grow you need your own set-up.

So.. my bottom line is... where do you want to be. if you are planning to stay small - go ahead play on Amazon's cloud. If you are planning to grow... (I hope you are) make sure you know when is the right time to go out.

you all need to make the exercise - what is the lowest cost for your business at any time. you might find very interesting facts about AWS.

--
http://olahav.typepad.com

Zivo

unread,

Jan 9, 2011, 7:33:27 AM1/9/11

to ILTechTalks

Ori,

Jus a quick drop...

"headroom that you keep anyway for DR"

Remind me what is DR? :-)

THIS IS (one of) the dif. between the claculation you made (server Vs.
server) to the REAL calculation I'm doing - total cost of service.
With AWS my DR is a "soft" image sotred on my S3.
It's take me 40second!!! to have it up and running.
Not only I don't spare anything, I can turn server down during low
load and intreduce new servers at pick hours.

And yes, you should think how you get out of AWS and keep that in mind
(architecture), but first, you must be large enough.

I think one of the questions (although Lior didn't ask it) , is the
number I tried to give, how large you should go before go independent?

> > > --http://olahav.typepad.com-Hide quoted text -

Assaf yardeni

unread,

Jan 9, 2011, 11:44:32 AM1/9/11

to iltec...@googlegroups.com, nat...@milford.io

Nathan, Ori,

Thanks for the invitation and the interesting presentation, It surely gave me some points to think of during the our last steps of design and DC setup.

Assaf Yardeni

+972 54 2021873

Ori Lahav

unread,

Jan 10, 2011, 4:09:00 AM1/10/11

to iltec...@googlegroups.com

Well - you are right - you don't have to keep DR if you are on AWS. well... partially right.

As you know in web system that wants to be responsive - the data you need for serving have to be wisely cached in memory. whether it's memcached or DB cache or even filesystem cache. system that is not warmed up can support a bit sluggish DR. so to get your service back up takes you few hours and not 40 seconds. If you want to keep your DR warm you have to do it Active Active and keep traffic to both sites so all your data/caching layer will stay active. If you do that - you don't save much.

However - this is very system and business specific - Some businesses don't need DR at all - some are tolerant for such failures etc...

what I found amazing is that the cost of hardware usage in AWS is 3x-7x higher so even if I hold DR headroom it is still more financially efficient!

BTW: the more datacenters you have around the world - the less DR headroom you need to keep. which makes things much more efficient.

It is hard to answer Lior's question as each of these calculations need to be done on system requirements and business needs.

As Nir said before for shopping.com AWS is 2-5X more expensive where for outbrain it's 3-7X more expensive. As I showed above - eve on single DB machine - on a yearly calculation Colocation gets break even to AWS. so things are not that straight forward and everyone need to do his own calculations as he grows. no one can do it for you - but there are people that can sit with you and give you the actual costs of colocations (or any other alternative) because they do it all the time. I will not promote these people here but you can write me a private email and I'll connect you.

--
http://olahav.typepad.com

Ran Tavory

unread,

Jan 10, 2011, 7:49:59 AM1/10/11

to iltec...@googlegroups.com

Here's another point to consider. In quite a few shops I've worked with, operations were the bottleneck.

I know Eran was asking about hardware and ops man-hour cost but I think developers cost are not less important. Having developers wait for ops, be throttled by ops or find ways around ops (usually it's a combination of all three) is costly both financially and morally. It's hard to quantify the cost so don't expect to see numbers but my feeling is that it's a high cost.

Why is devs cost throttled by ops relevant to the discussion at hand? Well, it doesn't have to be this way but it is in so many cases b/c usually what comes along with IaaS and more profoundly in PaaS is that ops get out of the critical path, sometimes even out the door (no offense, anyone, please...).

Developers can take care of their own services, for example if you've used app-engine you'd see that for most of the stuff you're all by yourself, all the infrastructure is provided by google and you only need to code your logic. More or less... Granted, there are tradeoffs when coding to appengine and I'm not going to say it's a perfect or even a reliable system but my point is that in many cases, if your service can be implemented on top of one of one of the PaaS systems such as appengine then as developers you need to worry less about ops or being throttled by it and IMO this goes a long way to developer productivity and happiness.

My personal opinion is that some PaaS are not ready today but I have no doubt that they will be ready in the next couple of years just b/c their economics makes sense. I don't know whether today you'd be able to get away with something such as appengine but I have no doubt that you will not far from today. And once you do that, it's not a cpu-per-cpu calculation, it's a much broader calculation that affects not only the expense bottom line but also the spirit and moral of the company.

IaaS such as amazon are midway b/w a colo and a PaaS so on one hand, unlike PaaS, they let you do whatever you want to do with them more or less (just more expensive on a per resource basis than a colo) but on the other hand they abstract many of the traditional operations kind of work away for you. Ori's calculation show that that traditional kind of work is merely a 12.5% but what I'm saying is that it's not the % that count, it's the necessity. If you don't have to care about disk mounts, inodes, block sizes etc then you're less likely to hire a traditional ops guy and more likely to hire a devops guy, someone that programs for a living but also takes interest in the programatic operational work such as monitoring, deployments, performance optimizations etc. Every generalizations sins so I don't want to generalize, but what i can say is that from what I've been seeing, it's usually the case that traditional ops tend to fallback into applying ad-hoc solutions while programmers tend to apply more systematic solutions so if the balance in the organization tends towards more devs and less ops you tend to have a more efficient team. That's how I see devops.

The promise of devops is that ops and devs work closely together and in my view, "closely" means the same person. Having a high level infrastructure, PaaS or even IaaS makes this closeness so much easier. Having a colo makes this closeness harder.

A "private cloud" is another middle way. Regardless of the term being a misnomer, when I say private cloud I just mean "a very good infrastructure running on my own hardware". It's a middle way b/c it somewhat abstracts you away from repeatedly dealing with low level settings and let you programmatically or declaratively do that. But the cost of such a private cloud is far grater than just the CPUs and the 12.5% mentioned. I can't qualify it but I've seen a few high profile companies that quantify it as 70% of their work effort give or take. Not negligible. It's most definitely worth it for companies such as Google or Amazon, but is it worth it for your company? The calculation just got messier...

--

/Ran

Ori Lahav

unread,

Jan 10, 2011, 8:20:36 AM1/10/11

to iltec...@googlegroups.com

ahhhh... now you have opened a great subject for the next Reversim episode :)

--
http://olahav.typepad.com

Ori Lahav

unread,

Jan 10, 2011, 10:53:24 AM1/10/11

to iltec...@googlegroups.com

Guys

As promised - I uploaded my presentation here:

https://sites.google.com/site/iltechtalks/managing-data-center-as-oppose-to-cloud-based-set-up

it's the second attachment.

Not much details are there as I usually speak about them verbally in presentations but... you can get the general idea.

Ori

--
http://olahav.typepad.com

Eran Sandler

unread,

Jan 11, 2011, 1:48:02 AM1/11/11

to iltec...@googlegroups.com

First of all I total agree with everything you said here :-)

I wanted to save this discussion for later after we clear the more "generic" costs discussion and move to the unquantify-able items.

Connecting this with our previous discussions about "generalists" or "athletes" (as someone else mentioned), actually being a devops guy, IMO, makes you a far better developer.

When you actually see (and feel) the pain of managing something you tend to find better solutions for it (and this goes a bit to your point about solving things more systematically). Also, it gives you the real full-stack approach of actually knowing how your system runs below.

When you know how the OS is configured and behaves you get a far better understanding of your code.

I've seen more than a fair share of developers saying "this is the ops team problem" on things like easier configuration on deployment, backups, storage considerations and other things and this is why I believe that a good dev team that develops server apps should also do the ops, to some degree (everything but physically installing the machine or parts of it :-) although I would tend to say that they should also do that :-) ).

That's where IaaS vendors come in to make things easier for the devs (and this is actually one of the rationals of doing this at Amazon in the first place - that and the fact that they work in a "up to 2 pizzas" teams).

About "private clouds" or how its suppose to be called - a VERY smart virtualization platform - we are still at the start of the innovation here with OpenStack (http://www.openstack.org/ - which is just starting) and Eucalyptus (http://www.eucalyptus.com/ - which mimics and actually implements the same APIs as AWS).

Once these technologies are a bit more mature, it will introduce the necessary abstractions in a DC to be able to move back and forth from a vendor such as AWS to a DC and back as well as give the necessary freedom for the developers to be more productive.

Sure, a virtualized env doesn't necessarily gives the "RAW" performance that a non virtualized environment gives, however, as a friend of mine told me the other day "Constraints are liberating". Although this was said about iOS development, it applies to an environment such as the cloud that has its own constraints which makes your software much better (the Chaotic Monkey app they wrote to always live in random failures - http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html - this is also the Google approach to software saying "expect random exceptions in hardware and software and handle them gracefully!".

Eran

On Mon, Jan 10, 2011 at 2:49 PM, Ran Tavory <ran...@gmail.com> wrote:

Ran Tavory

unread,

Jan 11, 2011, 1:59:50 AM1/11/11

to iltec...@googlegroups.com

BTW Eran, a side note, a "private cloud" does not necessarily mean virtualization. Although openstack and eucalyptus chose the VM approach, it does not have to be this way. At google we had a kickass private cloud (it was not called this way, it was just called "a kickass infrastructure" ;-) that did not rely or used VMs. VMs are great for solving trust boundaries issues of collocated apps, varying OSs and similar problems but they are not a necessity for the private infrastructure.

--

/Ran

Boris Shulman

unread,

Jan 11, 2011, 3:24:21 AM1/11/11

to iltec...@googlegroups.com

Regarding private cloud. I totally agree that openstack and eucalyptus
are not mature enough, and also one of the eucalyptus drawbacks is
that it is not "open" enough. Actually this is one of the reasons NASA
developed Nova which is the openstack compute. You can find in the
market other more mature technologies that gives you the
virtualization abstraction layers like cloud.com or platform ISF.
Cloud.com is actually part of the open stack community and also have
integration with RightScale. So you can deploy from your RightScale
account eithe to ec2 public cloud or to cloud.com private cloud.
There is a community edition of cloud.com product which is probably
provides only limited functionality, so the main problem with those
products is the cost, so I don't see how small organizations can
leverage such products. And anyway I don't see what advantages you can
get from a "private cloud" in a small organization which delivers
single product. BEcause you will still have to maintain all your
infrastructure.

My 2 cents..

On Tue, Jan 11, 2011 at 8:48 AM, Eran Sandler <eran.s...@gmail.com> wrote:

Eran Sandler

unread,

Jan 11, 2011, 7:10:56 AM1/11/11

to iltec...@googlegroups.com

I guess it fits more to a company the size of Google. Of course if you had the most kick ass PaaS that can answer all of your needs, you won't need a VM and just have these machines run the PaaS software.

But then, everything become more "generic" in nature and customized services cannot be distinguished than generic ones. That's where a VM can be better.

Not to mention that some would argue that if you have an underlying OS that better understand various needs, it can optimize things like IO access and network access better on machines that contains more than a single type of app.

Eran

Ori Lahav

unread,

Jan 11, 2011, 4:42:04 PM1/11/11

to iltec...@googlegroups.com

Eran/ran.

I agree with most of what you've said above. and i will not repeat it.

I do think that conceptually there should be a distinction between Production environment and Dev environment. at scale - production env is much bigger then dev and hence should be as lean and as cost effective as possible. I disagree that by having a lousy physical infrastructure you will build more robust and fast software to compensate about it. your software should be robust and fast anyway and your hardware configuration should be the most effective (cost and performance) to do the job. it's iterative, ongoing process between the ops side and the dev side (even if it's the same person:))

Dev environment - Frankly, in outbrain I think we should get better with it. I think the optimum is for the Developers to be able to get machines per demand from a designated pool and use them for dev purposes. VMs usually make it more easy but if you want the developers to 'feel the metal' as much as they can, I guess a kickass infra like ran mentioned - it's the best to give. but... it does take time to set-up something like that.

As for Dev vs Ops or DevOps - Devops is a term that everyone has another definition for it. I see several approaches to it and I did not yet made my mind what is the best way to approach it. for example - Yedda and Wealthfront are having no ops and developers are taking care of it. the hardware part is done by contractor. in Bigger companies like facebook - There are Ops guys but they work within the dev teams so Dev have immediate assigned system assistance for their projects.

there are probably several other schemes.

The way I see DevOps is the part that makes Ops guys way more productive using automation/monitoring/culture/etc... I believe that if you strive to the right culture - Ops guys becomes more developers (Infrastructure as code) and developers becomes more of Ops guys (monitor their Apps in production, care about the hardware their app is running on, export monitoring hooks, generate the monitoring for their apps, etc...) at the end - I believe that Ops are developers with more system skills same as UI developer is a developer with more UI skills.

On Tue, Jan 11, 2011 at 8:59 AM, Ran Tavory <ran...@gmail.com> wrote:

--
http://olahav.typepad.com

Ran Tavory

unread,

Feb 11, 2011, 5:46:43 AM2/11/11

to iltec...@googlegroups.com

This is a long due thread but I just happened to read an interesting perspective on the subject from Bret Taylor, currently Facebook's CTO and in the past FriendFeed's CEO and a googler.

From http://www.bbc.co.uk/news/business-12406171

What's the biggest technology mistake you've ever made - either at work or in your own life?

Prior to Facebook, I was the chief executive of a small internet startup called FriendFeed.

When we started that company, we were faced with deciding whether to purchase our own servers, or use one of the many cloud hosting providers out there like Amazon Web Services.

At the time we chose to purchase our own servers. I think that was a big mistake in retrospect. The reason for that is despite the fact it cost much less in terms of dollars spent to purchase our own, it meant we had to maintain them ourselves, and there were times where I'd have to wake up in the middle of the night and drive down to a data centre to fix a problem.

What I realised was that you can't measure the quality of your life in dollars alone. I think that most of the people that worked at FriendFeed would agree that if that part of the company were just taken care of, it would have been worth all of the extra money we would have spent on it.

Very few of the startups I know in Silicon Valley actually purchase their own servers now, they're using these cloud hosting providers, and I wish we had as well.

--

/Ran

Eran Sandler

unread,

Feb 11, 2011, 8:00:02 AM2/11/11

to iltec...@googlegroups.com

Nice find Ran!

I agree with Bret's notion. It is a pain to do all that and hiring someone to take all that pain away (as in hiring a sysadmin for doing that) may not be much fun for that person either (no one wants to handle all of the load of being ALWAYS on call)...

I guess it all boils down to how much one (or a company) is willing to pay for a certain level of comfort (and it manifests itself here at the level of not handling server hardware).

Israelis are famous for not wanting to pay extra for comfort as opposed to, say, Americans :-)

I sympathise with not measuring everything in $$$ and pure numbers since there are other factors, mental and psychological, that can make a huge impact on a company.

I'm sure Bret would have loved to handle some of these failures by spinning a new cloud server instance even in the middle of the night while sitting in his underwear and clicks a few buttons instead of getting dressed and driving into a data center at night.

Eran

Tor

unread,

Feb 12, 2011, 11:43:35 AM2/12/11

to iltec...@googlegroups.com

Ori Lahav

unread,

Feb 13, 2011, 5:17:07 AM2/13/11

to iltec...@googlegroups.com

mmmm....

Nathan, the Outbrain sysadmin in NY in the last 2 years, is usually on these threads.

Nathan, feel free to pop into this thread and say - how many times in the last 2 years you had to drive into the datacenter in the middle of the night to fix something?

--
http://olahav.typepad.com

Arik Fraimovich

unread,

Feb 13, 2011, 5:51:28 AM2/13/11

to iltec...@googlegroups.com

I think that driving to the DC at the middle of the night is just one
extreme example, that Bret used to show more clearly the point. Also
asking the sysadmin person what he thinks about systems work is kind
of wrong. It the same as if you will ask your CFO what he thinks about
... mmm... financial related stuff. :) Of course that doing taxes at
the end of the year sucks, but he enjoys (hopefully) the rest of his
job.

A better question to ask is how much you enjoyed dealing with the
operations stuff before you had a sysadmin?

Nathan Milford

unread,

Feb 13, 2011, 9:59:12 PM2/13/11

to ILTechTalks

I can't think of any particular instance where I needed to go to the
data center late at night or under any real emergency circumstances.

Our infrastructure is robust enough to handle most failures and it is
only getting better.

Traffic can easily be rerouted to another facility as needed. Local
slaves can easily be promoted to masters and our other data stores
(hadoop, cassandra) deal pretty gracefully with failures. Many of
these issues can be handled with ssh on my android.

Thanks to our vendors, developers and engineers thoughtfulness when
planning/designing our systems and infrastructure, there are very few
situations that would warrant me dropping everything to rush over.
Component level redundancy (RAID), machine level redundancy
(replication, haproxy), wiring redundancy (interface bonding/teaming,
switch stacking), power redundancy, and transit/transport feed
redundancy all mean I get to sleep at night. IP KVM, IPMI and remote
reboot all mean I

If your sysadmin is rushing out to the datacenter like that, then you
have a design problem.

The only things that keep me up at night are bugs and application
problems that I can fix from my netbook.

:)

On Feb 13, 5:17 am, Ori Lahav <ola...@gmail.com> wrote:
> mmmm....
>
> Nathan, the Outbrain sysadmin in NY in the last 2 years, is usually on these
> threads.
> Nathan, feel free to pop into this thread and say - how many times in the
> last 2 years you had to drive into the datacenter in the middle of the night
> to fix something?
>
>
>
>
>
>
>
> On Sat, Feb 12, 2011 at 6:43 PM, Tor <tork...@gmail.com> wrote:
>
> > On Fri, Feb 11, 2011 at 12:46 PM, Ran Tavory <ran...@gmail.com> wrote:
>
> >> This is a long due thread but I just happened to read an interesting
> >> perspective on the subject from Bret Taylor, currently Facebook's CTO and in
> >> the past FriendFeed's CEO and a googler.
> >> Fromhttp://www.bbc.co.uk/news/business-12406171
>

> >> *What's the biggest technology mistake you've ever made - either at work
> >> or in your own life?*

> >>>> On Tue, Jan 11, 2011 at 8:48 AM, Eran Sandler <eran.sand...@gmail.com>wrote:
>
> >>>>> First of all I total agree with everything you said here :-)
>
> >>>>> I wanted to save this discussion for later after we clear the more
> >>>>> "generic" costs discussion and move to the unquantify-able items.
>
> >>>>> Connecting this with our previous discussions about "generalists" or
> >>>>> "athletes" (as someone else mentioned), actually being a devops guy, IMO,
> >>>>> makes you a far better developer.
>
> >>>>> When you actually see (and feel) the pain of managing something you
> >>>>> tend to find better solutions for it (and this goes a bit to your point
> >>>>> about solving things more systematically). Also, it gives you the real
> >>>>> full-stack approach of actually knowing how your system runs below.
> >>>>> When you know how the OS is configured and behaves you get a far better
> >>>>> understanding of your code.
>
> >>>>> I've seen more than a fair share of developers saying "this is the ops
> >>>>> team problem" on things like easier configuration on deployment, backups,
> >>>>> storage considerations and other things and this is why I believe that a
> >>>>> good dev team that develops server apps should also do the ops, to some
> >>>>> degree (everything but physically installing the machine or parts of it :-)
> >>>>> although I would tend to say that they should also do that :-) ).
>
> >>>>> That's where IaaS vendors come in to make things easier for the devs
> >>>>> (and this is actually one of the rationals of doing this at Amazon in the
> >>>>> first place - that and the fact that they work in a "up to 2 pizzas" teams).
>
> >>>>> About "private clouds" or how its suppose to be called - a VERY smart
> >>>>> virtualization platform - we are still at the start of the innovation here

> >>>>> with OpenStack (http://www.openstack.org/- which is just starting)
> >>>>> and Eucalyptus (http://www.eucalyptus.com/- which mimics and actually

> >>>>> implements the same APIs as AWS).
>
> >>>>> Once these technologies are a bit more mature, it will introduce the
> >>>>> necessary abstractions in a DC to be able to move back and forth from a
> >>>>> vendor such as AWS to a DC and back as well as give the necessary freedom
> >>>>> for the developers to be more productive.
>
> >>>>> Sure, a virtualized env doesn't necessarily gives the "RAW" performance
> >>>>> that a non virtualized environment gives, however, as a friend of mine told
> >>>>> me the other day "Constraints are liberating". Although this was said about
> >>>>> iOS development, it applies to an environment such as the cloud that has its
> >>>>> own constraints which makes your software much better (the Chaotic Monkey
> >>>>> app they wrote to always live in random failures -

> >>>>>http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws....this is also the Google approach to software saying "expect random

> ...
>
> read more »

Nathan Milford

unread,

Feb 13, 2011, 10:00:52 PM2/13/11

to ILTechTalks

Need to finish one sentence:

IP KVM, IPMI and remote reboot all mean I can deal with a lot of
physical matters remotely.

> > >>>>> with OpenStack (http://www.openstack.org/-which is just starting)
> > >>>>> and Eucalyptus (http://www.eucalyptus.com/-which mimics and actually

> > >>>>> implements the same APIs as AWS).
>
> > >>>>> Once these technologies are a bit more mature, it will introduce the
> > >>>>> necessary abstractions in a DC to be able to move back and forth from a
> > >>>>> vendor such as AWS to a DC and back as well as give the necessary freedom
> > >>>>> for the developers to be more productive.
>
> > >>>>> Sure, a virtualized env doesn't necessarily gives the "RAW" performance
> > >>>>> that a non virtualized environment gives, however, as a friend of mine told
> > >>>>> me the other day "Constraints are liberating". Although this was said about
> > >>>>> iOS development, it applies to an environment such as the cloud that has its
> > >>>>> own constraints which makes your software much better (the Chaotic Monkey
> > >>>>> app they wrote to always live in random failures -

> > >>>>>http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws....is also the Google approach to software saying "expect random

> > >>>>> exceptions in hardware and software and handle them gracefully!".
>
> > >>>>> Eran
>
> > >>>>> On Mon, Jan 10, 2011 at 2:49 PM, Ran Tavory <ran...@gmail.com> wrote:
>
> > >>>>>> Here's another point to consider. In quite a few shops I've worked
> > >>>>>> with, operations were the bottleneck.
> > >>>>>> I know Eran was asking about hardware and ops man-hour cost but I
> > >>>>>> think developers cost are not less important. Having developers wait for
> > >>>>>> ops, be throttled by ops or find ways around ops (usually it's a combination
> > >>>>>> of all
>

> ...
>
> read more »

Zivo

unread,

Feb 15, 2011, 4:06:51 AM2/15/11

to ILTechTalks

Once small comment:

"
- Cost of CoLo (space you take - I know a 40U rack, at least on the
net,
costs ~$1000/mo and these database beasts are usually 4U, if I'm not
mistaken)
"

The 40U is not relevant - The point is ANY DC will limit you on power,
e.g. how much power they are willing to provide to a single rack (it's
not only the power they push' it's also the heat they need to "take
away").
Also, most of the DC, are not distinguishing between power you reserve
to power you are cunsuming- If you need 10Amp, and you like to have
dual feed (for redundency) they will calcualte it as 20Amp (although
you never consume more than 10A!!!).
It's hard to find DC with more than 6KW per rack => 6000/220=>25-30A
=> 2*15A feed.
Cut a long stroy short - If you happened to run some disk shelf, it
may take 4U only, but may take 10Amp.
So far you have One rack and you have all infrastructure to work local
(KVM + screen),
you rack will be full, else-
****** your 40U is meaningless and you must calculate power and not
U! ********************8

Gilad Ben-Nahum

unread,

Feb 15, 2011, 6:55:07 AM2/15/11

to iltec...@googlegroups.com

Great thread, very useful for us, in SyGen, as we are now exploring this issue exactly.

Wonder if anyone knows relevant Israeli service providers ?

We know / met few, such as Med-1 (http://www.med-1.com) and Interhost (http://www.interhost.co.il/) , thought would appreciate your recommendation on the same

.

The reason we explore Israeli vendors is due to a potential latency we might get if using clouds outside of Israel. ( our product is interacting with several Israeli web services, and performance is crucial for us)

We will run some stress test using Amazon to verify / revoke that concern prior taking a decision.

Thanks in advance, Gilad

--

________________________

Gilad Ben-Nahum

mobile: +972-54-6474743
e-mail: bng...@gmail.com

Ori Lahav

unread,

Feb 15, 2011, 7:21:12 AM2/15/11

to iltec...@googlegroups.com

Gilad, Try TripleC - they also have nice new Datacenter and are also offering cloud solutions.

I don't have any experience with them so cannot recommend or not recommend - talk to them.

--
http://olahav.typepad.com

Ori Lahav

unread,

Feb 15, 2011, 7:59:45 AM2/15/11

to iltec...@googlegroups.com

Zivo.

this might be relevant to early 2000.

on 2011 with all the Green awareness of all vendors you can get:

1. DB machine (high spec as listed above) of 1/2U.

2. it takes 1/4 of the power of the old 4U (Dell 6950 like) machines.

3. in the post facebook/google era - you don't use disk shelves but distribute/replicate your DB on commodity hardware. (NetApp is a bad word in these architectures).

4. The new machines also generate much less heat and are working just fine in room temperature so colocations don't over cool like they use to before.

In outbrain, with our Colocation vendor, we pretty much utilize the 40U even with the 1/2U machines you can get to more (if you have enough switch ports).

Bottom line - all the frightening costs of DC management from the 2000s are not relevant anymore. It is just 'know how' and engineering. Same as you should know how to write efficient code - you should know how to build, architect and maintain efficient infrastructure.

again, think about 100s of servers - not <10.

That's all.

On Tue, Feb 15, 2011 at 11:06 AM, Zivo <ziv.o...@gmail.com> wrote:

--
http://olahav.typepad.com

Zivo

unread,

Feb 16, 2011, 5:27:38 AM2/16/11

to ILTechTalks

Ori,
If I wasn't clear, I'll rephrase...
What I'm saying is:
The Hieght of the rack + the size of the servers is not the right
directon for calculate how many servers you can place at one rack.
When it's come to "density" of serevrs (and storage?), at most of the
csases - the total power put the limitiation and not the phisical
size.
Having say that, it's drill down to basic details.....

* Do I need graphics chip set? I may want to use less powred serer
(and have two servers), but not use the heigh end serve wich include
powered graphics chipset (I don't need) (be aware, some appliction do
use the graphics chip set for encoding pictures / video).

* In case of storage, which type disks I'm using? Am I, space
sensative or performance sensative?
Should I go with "as less as possible" large disks =>lower performance
+ lower power consumption.
Or many small disk => Heigh performance, but higher power consumption?

* Should I go "pizza" (1 or 1/2 U) or should I go Blade technology???
(You will suprised, Blade give very good phisical density, but Piza
give better performane / power rate! - The balde wasting A lot of
power on fans..)
* Should I go with dual power supplys server or should I go with
single (and may ask the DC for 3rd and 4th power feed, from diff,
UPS?)?

Cut a long story short, tody, MORE than before (and much more becuse
of the "GREEN"), density is power bounded and not phisical bounded.

Last but not least - when it's come to high density rack - what about
the phisical layer - The Ethernet cables and ports???
I had racks with 2X48 Ethernet ports - try to imagine how the cables
look like :-(

And yes - all of these small tiny diff. are more applicable for 100+
servers, not 10 (this ia one of the reasons Google DC have there own
servers).

> --http://olahav.typepad.com- Hide quoted text -

Zivo

unread,

Apr 9, 2011, 6:22:53 AM4/9/11

to ILTechTalks

http://opencompute.org/
and
http://opencompute.org/servers

If you insist to buy your own servers - worth to read.

> > --http://olahav.typepad.com-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -

Reply all

Reply to author

Forward