[bosh-users] High Availability on bosh.

397 views
Skip to first unread message

Alan Morán

unread,
Jul 3, 2014, 3:35:26 PM7/3/14
to bosh-...@cloudfoundry.org
Hi ,
I wanted to ask the community which are the possible approaches being used to deploy bosh.
Other the inception -> microbosh -> bosh, How would you set your bosh servers to make sure they don’t go down. 
Lets say for some reason I lose the microbosh VM,  How can I regain control over my bosh?
Also anyone know if its possible to do mirror between boshes?
lets say 2 different bosh servers that each of them have the resurrector on and are checking on each other?
I could not find any docs on this.

Thanks,
- - - -
Alan Moran
 
Altoros — Cloud Foundry deployment, training and integration

http://bonzofenix.com/ 

Alexander Lomov

unread,
Jul 4, 2014, 10:52:40 AM7/4/14
to bosh-...@cloudfoundry.org, alan....@altoros.com
Hi, Alan.

I'm sure that you can save your McroBOSH installation by using persistence disk for its data. 

Also I remember interesting discussion about BOSH disaster recovery here: https://groups.google.com/a/cloudfoundry.org/d/msg/bosh-dev/H7-OUnT0MJQ/cnbES3yidBUJ


Best wisher,
Alex L.

Greg Oehmen

unread,
Jul 7, 2014, 11:54:53 AM7/7/14
to bosh-users
Alan, All:

I really like the HA opportunity of using microbosh to deploy bosh and then use microbosh (or maybe bosh) to deploy a second bosh such that the two bosh deployments are monitoring and supporting each other.  At that point, the microbosh is no longer necessary.  It is a topic that I want to explore further with the BOSH team in the near future.

Best

Greg

Greg Oehmen
Cloud Foundry Product Manager - Bosh
Pivotal


To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.

Alan Morán

unread,
Jul 7, 2014, 10:54:03 PM7/7/14
to bosh-...@cloudfoundry.org
Hi Greg,
This is good news, I think an approach of 2 Boshes taking care of each other would be great. Do you think this would be possible to do soon? Do you think this can be achieve with the current features of bosh? 
I am research on suitable production setup for bosh that can be done in the next month if possible and that won’t depend exclusively on microbosh. I think it's something crusial for production environments.
How is the community resolving this at the moment?

Thanks,

- - - -
Alan Moran
 
Altoros — Cloud Foundry deployment, training and integration

http://bonzofenix.com/ 

Andrew Shafer

unread,
Jul 8, 2014, 3:20:37 AM7/8/14
to bosh-...@cloudfoundry.org

I'm just going to throw this out there.

I'm new and BOSH isn't going to get my main focus for the moment, but I do think solving the problems BOSH solves is a big deal and is going to be an even bigger deal.

Whenever I see someone heading towards the parallel watcher watches the watcher fail over HA solutions, I believe the better approach is reframing the problem and making the single system more fault tolerant and self remediating.

Fault tolerance isn't a feature that is simply implemented and typically requires a methodical architectural approach to the whole system. This isn't free, but arguably more predictable, maintainable and operable.

As a user trying to get things done today that might be too much to ask, and for some of deployments the BOSH inception HA strategy might be sufficient for many scenarios, but I at least want to plant the seed of thinking in this direction.

This presentation changed the way I think about these problems: http://www.infoq.com/presentations/Systems-that-Never-Stop-Joe-Armstrong

The talk appears to be about Erlang, but is generally about building reliable systems and laws those systems must obey. I recommend watching the presentation no matter what kind of systems and languages you use.

When I have a better idea about BOSH internals and how things fail in production, I hope to offer more than platitudes, but that is all I have for now.

James Bayer

unread,
Jul 8, 2014, 11:21:52 AM7/8/14
to bosh-users
i see it andrew's way on this topic. i also think the precise approach of "how" single BOSHes or multiple is a detail that ideally the product team can leave for engineering to drive with product team input (as the result certainly affects the BOSH UX, system requirements, etc). ultimately, the product high-level requirement and priority for "high availability" is the main way we should express this. having said that, let me switch hats and opine some ideas that bounce around my head. 

in my ideal view of a future BOSH, there is a very straight-forward way to get started with BOSH as a single node. nic, ferdy, dmitriy and pivotal's product that uses BOSH called Ops Manager all have experimented with various approaches:
* the current microbosh install from the bosh CLI approach [1]
* an Inception VM Image that has everything needed to get the initial BOSH instance deployed (ferdy, nic and Ops Manager all have used this approach at times). this certainly simplifies some of the steps in the microbosh install
* a vagrant plugin [2] that uses vagrant with a bosh deployment management to have a guest VM get softrware using a bosh deployment manifest. dmitriy has some ideas about how we can use an approach like this to create inception VMs, stemcells, etc.

once you have this single node, it would be a good UX if BOSH instance were able to be expand itself to have multiple nodes to add high availability and scaling benefits. for example, when MicroBOSH first starts, there are many co-located components like a director database, NATS, health monitor, etc on the single node. since the CF MySQL release [3] is adding an HA MySQL, that is currently working on a multi-node deployment of MySQL with a recovery time objective of several minutes called "Durable MySQL" [4]. i can imagine having the single node BOSH, deploy an HA MySQL instance which is comprised of multiple nodes and some type of BOSH Errand which migrates data out of the co-located database into the HA MySQL database. similarly, i can imagine the single-node BOSH deploying multiple additional gnatsd instances and then turning off the co-located nats process. now this BOSH instance has a highly available database and a highly available messaging tier. use this pattern to finish breaking out the co-located components.

currently BOSH deploys new VMs for these activities. in a future world we can imagine BOSH reusing Diego to run both short and long running bosh processes such as NATS and MySQL HA components with constraints like "distribute instances of this process to separate hosts, ideally in separate AZs" and we can shift from a VM mindset to a Container mindset.

i'm done opining. it's going to take awhile to get somewhere like this, but that's what i've been thinking about.

--
Thank you,

James Bayer

Greg Oehmen

unread,
Jul 8, 2014, 2:36:22 PM7/8/14
to bosh-users
Thanks for the input both of you. Your thoughts are appreciated as are the opportunities to learn.  

Greg



Greg Oehmen
Cloud Foundry Product Manager - Bosh
Pivotal


David Laing

unread,
Jul 9, 2014, 6:03:50 PM7/9/14
to bosh-users
There is a wiki page on how to recover a "lost" microBOSH - basically, if you can recover the persistent disk (eg, from an AWS snapshot), it's pretty straight forward to deploy a new MicroBOSH and attach the existing disk to it - https://github.com/cloudfoundry-community/cf-docs-contrib/wiki/Backup-and-disaster-recovery

Its worth remembering that as long as you don't use your MicroBOSH for DNS, your BOSH deployments are fairly independent of the MicoBOSH that deployed them.  ie, if your MicroBOSH goes down your deployed clusters still operate fine.


To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.



--
David Laing
logsearch.io - build your own open source cloud logging cluster
http://davidlaing.com

hel...@acm.org

unread,
Jul 10, 2014, 8:17:28 AM7/10/14
to bosh-...@cloudfoundry.org
On Thursday, July 10, 2014 12:03:50 AM UTC+2, David Laing wrote:
Its worth remembering that as long as you don't use your MicroBOSH for DNS, your BOSH deployments are fairly independent of the MicoBOSH that deployed them.  ie, if your MicroBOSH goes down your deployed clusters still operate fine.

In that case, we seem to be doing something wrong. We do use MicroBOSH for DNS for the BOSH, whilst BOSH is DNS to its deployments.
I've never been happy about this, but avoiding it seemed to involve hacking on resolv.conf and operating directly on the PowerDNS PostgreSQL database. Is there an easier way?

Regards

Jon Kåre
Jon Kåre Hellan, UNINETT AS, Trondheim, Norway

David Laing

unread,
Jul 11, 2014, 3:10:18 AM7/11/14
to bosh-users
Steps to avoid relying on PowerDNS

1.  In your deploy manifest, refer to other VMs by static IP (on AWS this requires deploying to a VPC)
2.  In your network blocks, specify the dns: property as your cloud's DNS servers (on AWS VPC this is 10.0.0.2 I think)

BOSH guru's, have I missed any steps?

D

Dr Nic Williams

unread,
Jul 11, 2014, 12:00:06 PM7/11/14
to bosh-...@cloudfoundry.org, bosh-users
Alternately you could use consul to advertise services across your bosh deployments. That is your deployments would take responsibility for service advertisement and discovery rather than BOSH-defined static IPs (not available to EC2 or Nova) or Bosh DNS. 
Reply all
Reply to author
Forward
0 new messages