Active/passive nodes with BOSH.

194 views

Skip to first unread message

Ben

unread,

Jun 18, 2014, 6:11:05 AM6/18/14

to bosh...@cloudfoundry.org

As part of our DR strategy we require passive nodes in our DR site for some services. Rather than going with different strategies for a lot of the different services we'd like to go with asynchronous disk storage replication.

We have forked BOSH and the way I'm implementing this is modifying the bosh director and agent to do the following :

1. Add passive mode to job definition which will prevent startup if active.

2. Change the bosh agent to mount/unmount storage around startup/stop instead of the mount/unmount messages.

3. Include DRBD build into stemcell.

4. Change the bosh agent to configure DRBD and restart it, updating the persistant store to be the drbd mapping.

5. Switch over to using lvm ontop of the partition.

6. Implement a Dynamic DNS update to the bosh agent on startup.

7. Add drbd health monitor and add functionality to report status to the client.

8. Fix the disk migration which the above will have broken probably using lvm and live migration.

1-3 are done, I'm currently working on 4.

The new job specification will look somthing like :

- instances: 1

name: postgres_test2

networks:

- name: cf1

static_ips:

- 10.93.230.19

persistent_disk: 4096

passive: true

drbd:

enabled: true

force_master: false

replication_node1: 10.93.228.18

replication_node2: 10.92.230.19

replication_type: A

secret: mysecret

dns_register_on_start: myservices.something.com

properties:

db: databases

release: cf

resource_pool: medium_z1

template: postgres

Obviously this means you won't be able to use more than one instance for these types on servers.

We are currently planning on having two BOSHs split over two datacentres. A cutover process would involve changing the main site to 'passive', doing a bosh deploy, then changing the dr site to 'not passive' and doing another deploy. In a disaster it would be a case of having to use the force_master in the DR site. We would not want to automatically cutover data services to the DR site without human eyes due to the potential data loss.

Any application considered high availablity will not use these services and instead use active active services. However we do run a whole bunch of applications where consistency is valued over availablity where having active/passive instances suits our needs better.

It would be interesting to hear peoples thoughts about this.

Ben.

Greg Oehmen

unread,

Jun 20, 2014, 5:49:10 PM6/20/14

to bosh...@cloudfoundry.org

Hey Ben:

Pretty cool. From my perspective, a key element is your use case definition: "Any application considered high availability will not use these services and instead use active active services. However we do run a whole bunch of applications where consistency is valued over availability where having active/passive instances suits our needs better."

Like you, I'm looking forward to hearing the thoughts of others on this.

Best

Greg

Greg Oehmen

Cloud Foundry Product Manager - Bosh

Pivotal

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-dev+u...@cloudfoundry.org.

Reply all

Reply to author

Forward

0 new messages