How to autoscale a CF deployment using bosh ?

1,271 views
Skip to first unread message

Guillaume Berche

unread,
Jul 26, 2012, 5:35:31 PM7/26/12
to vcap...@cloudfoundry.org

Hello, 

I'm wondering where to learn more about how to autocale a cloudfoundry deployment using bosh ?

The use-case would be a customer deciding to perform a load test campaign on a private cloudfoundry instance, and suddenly request a large number of app instances, and service instances, for a short duration. As a SRE for this Cf private instance, I would like to avoid having to oversize my vm pools for such events.

I understand the way to scale a bosh deployment, is to modify the deployment manifest to increase the size of pools, and instances of jobs. In the case of of a cloud foundry release, additional dea or services jobs (mysql, pgresql) would automatically register with the nats bus and be ready to serve traffic.

While this CF bosh deployment scaling can be automated (running bosh unattended deployments), scale up would need to trigger when the CF resources are close to limits and act on it. Similar for scale down.

Is there any publicly available feature for this ?

Seems related with the high water/low water marks produced by each service and sent by the collector to the TSDB server, which would in turn have some alerting threshold on those water mark metrics, and act on the bosh deployment: identify the matching job in the bosh deployment and increase the instance count.

For the scale down, I guess the dea and services also need to be consolidated to avoid fragmentation. I recall one can send a sighup to ask a mysql instance or a dea to shutdown, and migrate its data to another. I guess that when need to perform a scale down, one would first sighup services/dea instances that have no or few usage using the watermark metrics sent to the TSDB server.

Any experience with this cf autoscaling which can be shared?

Thanks in advance,

Guillaume.

Vadim Spivak

unread,
Jul 26, 2012, 8:27:35 PM7/26/12
to vcap...@cloudfoundry.org
We do capacity planning which involves manual scaling since at the end of the day there are real resources and/or account limits, whether it's on AWS or on a private cloud. 

Having said that, you could automate bosh deployments based on some metrics. We have considered implementing dynamic pools, but so far that discussion has been deprioritized in favor of improving the existing experience and stability the current features for vSphere, AWS, and OpenStack.

-Vadim

Guillaume Berche

unread,
Jul 27, 2012, 1:58:33 PM7/27/12
to vcap...@cloudfoundry.org
Thanks Vadim for your sharing of your current capacity planning and manual scaling on CF.com and the dynamic pool feature that was considered.


> at the end of the day there are real resources and/or account limits, whether it's on AWS or on a private cloud.

Not sure to understand: there are indeed real resources provisionned and they are no infinite, however one would keep ability to not permanently allocate all available resources to CF if not needed. Autoscaling a CF deployment on AWS would make both economical and environmental sense if there is sufficiently workload variations.

Even in the case of a private vSphere set up dedicated to a CF deployment, I suspect  DPM would more efficiency manage the power usage of the ESXi hosts if extra VMs are not started than being allowed to idle CF jobs, hence reducing OPEX costs which I understand become significant in large setups.

Guillaume.

Jesse Zhang

unread,
Jul 27, 2012, 4:41:29 PM7/27/12
to vcap...@cloudfoundry.org
You are basically asking for this feature:
  1. Periodically collect the capacity metrics of DEA's / Stagers
  2. When you feel you need more headroom, add more DEA's in your deployment manifest and "bosh deploy".
  3. When you feel the load is too load, reduce number of DEA's and re-deploy.
Don't you need to request a couple of reserved instances in AWS before you can actually execute step #2?

Jesse

Guillaume Berche

unread,
Jul 30, 2012, 8:59:32 AM7/30/12
to vcap...@cloudfoundry.org
> You are basically asking for this feature:
> 3. When you feel the load is too load, reduce number of DEA's and re-deploy.

Yes, ideally, the DEA/Services instances with the least load would be asked to migrate their remaining apps/services to other instances, and then exit. Then, if bosh dynamic pools get added, bosh would reduce the associated pool as the max number of idle VMs in the pool get reached.


>Don't you need to request a couple of reserved instances in AWS before you can actually execute step #2?

I understand this could work with the AWS on-demand instances that bosh is currently dynamically provisionning upon a "bosh deploy" command.

That would however be great if bosh was supporting the multiple instance types (on-demand, reserved light-util, and possibly spot). Scanning bosh code, I could not yet find support for it, nor specifying bosh a set of manually provisionned instances to use (only reserved IPs in reserved network types).

Let's however dream and imagine how could look a pool "deas-dynamic-pool-reserved-li" configured to pool reserved instances with the Light Utilization of size m1.small. Combined with the dynamic pooling feature Vadim mentionned, when new instances need to be created, the pool could use an algorithm similar to:
1- first use an instance running without any allocated job if any
2- if not start a previously stopped Light Utilization reserved instance if any
3- if not, start a new Light Utilization reserved instance if max pool size not reached.
4- if not, start a regular on-demand instance

Some pools could even be configured with spot instances for some jobs that could handle well unplanned loss of some instances (an hadoop cluster, or a cache cluster). In some use-cases, it would make economical sense to use such bosh features on AWS.

Guillaume.


On Friday, July 27, 2012 10:41:29 PM UTC+2, Jesse Zhang wrote:
You are basically asking for this feature:
  1. Periodically collect the capacity metrics of DEA's / Stagers
  2. When you feel you need more headroom, add more DEA's in your deployment manifest and "bosh deploy".
  3. When you feel the load is too load, reduce number of DEA's and re-deploy.
Don't you need to request a couple of reserved instances in AWS before you can actually execute step #2?

Jesse

On Fri, Jul 27, 2012 at 10:58 AM, Guillaume Berche wrote:
Thanks Vadim for your sharing of your current capacity planning and manual scaling on CF.com and the dynamic pool feature that was considered.


> at the end of the day there are real resources and/or account limits, whether it's on AWS or on a private cloud.

Not sure to understand: there are indeed real resources provisionned and they are no infinite, however one would keep ability to not permanently allocate all available resources to CF if not needed. Autoscaling a CF deployment on AWS would make both economical and environmental sense if there is sufficiently workload variations.

Even in the case of a private vSphere set up dedicated to a CF deployment, I suspect  DPM would more efficiency manage the power usage of the ESXi hosts if extra VMs are not started than being allowed to idle CF jobs, hence reducing OPEX costs which I understand become significant in large setups.

Guillaume.


On Fri, Jul 27, 2012 at 2:27 AM, Vadim Spivak  wrote:
We do capacity planning which involves manual scaling since at the end of the day there are real resources and/or account limits, whether it's on AWS or on a private cloud. 

Having said that, you could automate bosh deployments based on some metrics. We have considered implementing dynamic pools, but so far that discussion has been deprioritized in favor of improving the existing experience and stability the current features for vSphere, AWS, and OpenStack.

-Vadim

Nicholas Kushmerick

unread,
Jul 31, 2012, 12:03:35 AM7/31/12
to vcap...@cloudfoundry.org
> deally, the DEA/Services instances with the least load would be asked to migrate their
> remaining apps/services to other instances, and then exit.

Just to mention.... a little bit of that functionality already exists for services: there is support for migrating instances from one node to another.  Note that it is a manual process, not automatic.

If you're interested, here's a good place to dive in: https://github.com/cloudfoundry/vcap-services/tree/master/tools/rebalance

-- Nick
--
Nick
phone +1.206.293.5186 · skype nicholaskushmerick

Guillaume Berche

unread,
Aug 3, 2012, 4:10:15 AM8/3/12
to vcap...@cloudfoundry.org
Thanks Nicholas for the pointer, this is quite interesting and usefull.

Guillaume.

ramonskie

unread,
Jan 16, 2014, 5:56:11 AM1/16/14
to vcap...@cloudfoundry.org
is there any new information about how to auto-scale your cloudfoundry?

Op vrijdag 3 augustus 2012 10:10:15 UTC+2 schreef Guillaume Berche:

James Bayer

unread,
Jan 17, 2014, 2:50:12 AM1/17/14
to vcap...@cloudfoundry.org
you can use the api's to scale bosh jobs or scale cloud foundry apps.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.



--
Thank you,

James Bayer

ramonskie

unread,
Jan 17, 2014, 3:43:07 AM1/17/14
to vcap...@cloudfoundry.org
i can't seem to find documentation of the bosh api.
is this correct? or is it hidden somewhere :)

Op vrijdag 17 januari 2014 08:50:12 UTC+1 schreef James Bayer:

James Bayer

unread,
Jan 17, 2014, 12:27:15 PM1/17/14
to vcap...@cloudfoundry.org
i was talking with ferdy and abhi. both said that the REST API for BOSH is not well-documented and should almost be considered private as it has enough things that the team will want the flexibility to change because of technical debth. abhi recommends viewing the CLI commands as the client API interface into BOSH for now until a nice HTTP Client api is improved and published. greg has recently joined the BOSH team as the PM and this will be on his list of items to research and take action on.  

Dr Nic Williams

unread,
Jan 17, 2014, 1:30:52 PM1/17/14
to vcap...@cloudfoundry.org
Please don't change the bosh API. It's not to be considered private. :(

If you want a new API please consider versioning.

Dr Nic Williams

unread,
Jan 17, 2014, 1:39:27 PM1/17/14
to vcap...@cloudfoundry.org
It is especially important that the bosh director doesn't break it's API as there is almost NO versioning for bosh - just commits that pass CI being released. It would be almost impossible to build tools to manage all the BOSHes in the world if they change the API without supporting versioning.

James Bayer

unread,
Jan 17, 2014, 9:34:07 PM1/17/14
to vcap...@cloudfoundry.org
nic, the state of the BOSH API is that it's only represented in ruby code and ruby tests. the cloud controller API at least now has a start at a set of api docs and tests [1]. the BOSH API needs something similar. until that time happens, i recommend that new users stick with the bosh CLI as the interface. for those that understand how to read ruby code, using the specs to reverse engineer the API and use it are advanced users and can have confidence that we won't intentionally change the API on them without notice.

Thank you,

James Bayer

ramonskie

unread,
Jan 20, 2014, 5:34:39 AM1/20/14
to vcap...@cloudfoundry.org
thanks for the info :)
we will check the code and will post my findings if it works


Op donderdag 26 juli 2012 23:35:31 UTC+2 schreef Guillaume Berche:

Iwasaki Yudai

unread,
Jan 29, 2014, 11:00:05 AM1/29/14
to vcap...@cloudfoundry.org
We have some progress on autoscaling BOSH deployments.
Our code is already working fine. We will open our code this February.
Wait for a while :)

Guillaume Berche

unread,
Jan 29, 2014, 3:06:57 PM1/29/14
to vcap...@cloudfoundry.org
Yeah, that's great! I can't wait to learn more, can you describe a bit more how it works and whether it has prereqs on the bosh releases or if it will also applies to CF releases ?

Thanks,

Guillaume.

Iwasaki Yudai

unread,
Jan 31, 2014, 3:29:56 AM1/31/14
to vcap...@cloudfoundry.org
The scaler basically requires no prereqs on releases/deployments. If you want to active a CF Varz support feature, you need a Collector to pass varz metrics to the scaler.

Here's a brief architecture image (google login required?).
https://drive.google.com/file/d/0BxyIZRLnNHpZQnBMSHVudDRaeGs/edit?usp=sharing

I think I will be able to make it public next week.

James Bayer

unread,
Jan 31, 2014, 9:47:15 AM1/31/14
to vcap...@cloudfoundry.org
this is very exciting! people ask for this all the time. i like the straight-forward approach. thanks for sharing the preview.

To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

Tammer Saleh

unread,
Jan 31, 2014, 1:16:58 PM1/31/14
to vcap...@cloudfoundry.org, Greg Oehmen
Hi Iwasaki,

+ Greg, the new BOSH PM

I love the way this is constructed - being well-separated from both CF and BOSH.  Couple of questions:
  • Did you implement a new collector plugin to gather the data from CF?
  • Why the choice to listen to BOSH NATS instead of grabbing health data from the Health Monitor?

Cheers,
Tammer Saleh

James Bayer

unread,
Jan 31, 2014, 1:46:42 PM1/31/14
to vcap...@cloudfoundry.org, Greg Oehmen, David Lee
+david
fixed greg's email  to gopivotal domain

even more detail, we have a team working with a JMX extension [1] that takes both metrics from BOSH Health Monitor and from CF Collector. the JMX component simply needs to look like an OpenTSDB endpoint because the CF Collector and BOSH Health Monitor already know how to talk to that from an earlier prototype of collecting metrics in CF v1 with OpenTSDB. this enables us to overlay both CF metrics and BOSH vm metrics onto the same MBean structure. so that might be an approach you could utilize so you wouldn't have to listen to NATS or build special plugins. we're still learning about the JMX solution, but it seems to be working out ok thus far.

Message has been deleted

Dmitry

unread,
Feb 2, 2014, 10:00:02 AM2/2/14
to vcap...@cloudfoundry.org

Fantastic feature which is essential to operate private CF, thank you very much for sharing this!
From the block diagram I can't understand how director will determine which deployment to use (and which release). Could you please explain how it will work (the sequence diagram will be highly appreciated)?
Thanks,
Dmitry

Iwasaki Yudai

unread,
Feb 7, 2014, 6:45:21 AM2/7/14
to vcap...@cloudfoundry.org
Oops, I had forgotten to resend my post (it seems I accidentally deleted the post).

Tammer,
I'm using the 'cf_metrics' historian included in the collector. I modified a single line to use 'http' instead of  'https' as the historian has a hard-coded URL scheme.
I have no special reason for the choice of the bosh nats collector. I thought changing the configuration of the bosh monitor to send metrics to the scaler is somewhat harder than listening BOSH NATS, especially with the Microbosh.

James,
JMX looks nice. Do you have any plan to implement an autoscaler with JMX ? Since It has capability to trigger events according to metrics, I guess JMX might be a good framework to build an autoscaler.

Dimtry,
The autoscaler has a config file to determine deployments and jobs to scale. I'm now planing to move these configurations to deployment manifests.
The scaler retrieves the current statuses of deployments from the director and does not change their release and stemcells. Other properties other than instance sizes are simply kept intact.

Thanks,
Yudai

James Bayer

unread,
Feb 8, 2014, 2:42:51 AM2/8/14
to vcap...@cloudfoundry.org
yudai, at pivotal we certainly would like to have richer metrics to use as a basis for autoscaling. i'm not sure that we would put the autoscaling directly in the platform vs having an approach like your proposal initially, where we can expose the metrics and have an external component query and take action on the metrics. we'll be sure to update everyone as the plans progress.

Iwasaki Yudai

unread,
Feb 13, 2014, 3:48:41 AM2/13/14
to vcap...@cloudfoundry.org
I've published our auto scaler:

https://github.com/nttlabs/bosh-scaler

James, I'm still not sure if my approach is the best way, either. I hope this scaler will help users to realise their exact needs and use cases for a BOSH scaler.

Nicolas Maurer

unread,
Feb 13, 2014, 4:10:41 AM2/13/14
to vcap...@cloudfoundry.org
thanks for sharing this tool. I already had a quick overview
and I like the approach how this tool is working together
with existing components.

We'll definitly give it a try asap.

James Bayer

unread,
Feb 13, 2014, 10:52:48 AM2/13/14
to vcap...@cloudfoundry.org
yudai, this looks very interesting! thank you for sharing this approach. after reviewing the sample config file [1] with ferdy, i'm very excited by the possibilities. i'm going to show this to the bosh and runtime teams.

Iwasaki Yudai

unread,
Feb 13, 2014, 11:27:31 AM2/13/14
to vcap...@cloudfoundry.org
Thanks James.

I'm planing to move rules in the config file to deployment manifests if possible to make dynamic configuration for each deployment easier than the current design.

Christopher Ferris

unread,
Feb 15, 2014, 7:34:11 AM2/15/14
to vcap...@cloudfoundry.org
Iwasaki-san,

This looks quite interesting/promising. Thanks for posting this work. I am curious as to whether NTT is considering taking this through the CF incubation process [1]?


Chris

Iwasaki Yudai

unread,
Feb 17, 2014, 12:17:59 AM2/17/14
to vcap...@cloudfoundry.org
Ferris-san,

I'd like to put this project to the incubation process if possible.
I believe more discussion on the community is necessary to create a scaler with good design.

Thanks,
Yudai

Christopher Ferris

unread,
Feb 17, 2014, 6:18:46 AM2/17/14
to vcap...@cloudfoundry.org
Iwasaki-san,

I'm glad to hear that. I think that there are some IBMers that would be interested in collaborating on this. We'll start the internal process reviews to enable us to engage.

Chris

Iwasaki Yudai

unread,
May 14, 2014, 2:07:42 AM5/14/14
to vcap...@cloudfoundry.org
I updated the AutoScaler with some new features.

Now AutoScaler can load scaling rules(policies) from deployment manifests, so you don't need to modify the config file of AutoScaler to update rules.
I also implemented a new collector to receive metrics in TSDB format for CF collector.

See the README for detailed information.

Thanks,
Yudai 

ramonskie

unread,
May 14, 2014, 3:59:11 AM5/14/14
to vcap...@cloudfoundry.org
what good news to start your day :)
thanks for al the hard work and keep it up...

Op woensdag 14 mei 2014 08:07:42 UTC+2 schreef Iwasaki Yudai:
Reply all
Reply to author
Forward
0 new messages