Active/passive nodes with BOSH.

194 views
Skip to first unread message

Ben

unread,
Jun 18, 2014, 6:11:05 AM6/18/14
to bosh...@cloudfoundry.org
As part of our DR strategy we require passive nodes in our DR site for some services. Rather than going with different strategies for a lot of the different services we'd like to go with asynchronous disk storage replication.

We have forked BOSH and the way I'm implementing this is modifying the bosh director and agent to do the following :

1. Add passive mode to job definition which will prevent startup if active.
2. Change the bosh agent to mount/unmount storage around startup/stop instead of the mount/unmount messages.
3. Include DRBD build into stemcell.
4. Change the bosh agent to configure DRBD and restart it, updating the persistant store to be the drbd mapping.
5. Switch over to using lvm ontop of the partition.
6. Implement a Dynamic DNS update to the bosh agent on startup.
7. Add drbd health monitor and add functionality to report status to the client.
8. Fix the disk migration which the above will have broken probably using lvm and live migration.

1-3 are done, I'm currently working on 4.

The new job specification will look somthing like :

- instances: 1
  name: postgres_test2
  networks:
  - name: cf1
    static_ips:
    - 10.93.230.19
  persistent_disk: 4096
  passive: true
  drbd:
    enabled: true
    force_master: false
    replication_node1: 10.93.228.18
    replication_node2: 10.92.230.19
    replication_type: A
    secret: mysecret
  dns_register_on_start: myservices.something.com
  properties:
    db: databases
  release: cf
  resource_pool: medium_z1
  template: postgres

Obviously this means you won't be able to use more than one instance for these types on servers.

We are currently planning on having two BOSHs split over two datacentres. A cutover process would involve changing the main site to 'passive', doing a bosh deploy, then changing the dr site to 'not passive' and doing another deploy. In a disaster it would be a case of having to use the force_master in the DR site. We would not want to automatically cutover data services to the DR site without human eyes due to the potential data loss.

Any application considered high availablity will not use these services and instead use active active services. However we do run a whole bunch of applications where consistency is valued over availablity where having active/passive instances suits our needs better.

It would be interesting to hear peoples thoughts about this.

Ben.

Greg Oehmen

unread,
Jun 20, 2014, 5:49:10 PM6/20/14
to bosh...@cloudfoundry.org
Hey Ben:

Pretty cool.  From my perspective, a key element is your use case definition: "Any application considered high availability will not use these services and instead use active active services. However we do run a whole bunch of applications where consistency is valued over availability where having active/passive instances suits our needs better."  

Like you, I'm looking forward to hearing the thoughts of others on this.

Best

Greg

Greg Oehmen
Cloud Foundry Product Manager - Bosh
Pivotal


To unsubscribe from this group and stop receiving emails from it, send an email to bosh-dev+u...@cloudfoundry.org.

Reply all
Reply to author
Forward
0 new messages