Fwd: [vcap-dev] CPI impl for VCloud: attachdisk, detachdisk, how?

skaar

unread,

May 4, 2012, 12:56:25 PM5/4/12

to bosh...@cloudfoundry.org

Forwarding this to bosh-dev

/skaar

---------- Forwarded message ----------
From: Guillaume Berche <ber...@gmail.com>
Date: Fri, May 4, 2012 at 9:54 AM
Subject: [vcap-dev] CPI impl for VCloud: attachdisk, detachdisk, how?
To: vcap-dev <vcap...@cloudfoundry.org>

Hello,

I've read into http://www.slideshare.net/marklucovsky/cloud-foundry-anniversary-technical-slides/
slide 15 that VCD is planned for H2 2012.

Looking at the https://github.com/cloudfoundry/bosh/blob/master/cpi/lib/cloud.rb
it seems to me that the attachdisk and detachdisk methods don't map to
existing verbs into the vCloud 1.5 API (http://www.vmware.com/pdf/
vcd_15_api_guide.pdf) i.e. no way to move disk around vms without
cloning vApps.

What bosh feature would be impacted if the attachdisk,detachdisk would
be partially implemented in a vCD CPI impl ? Briefly looking at source
it seems that the InstanceUpdater is an heavy user of this. I need to
look more into it and test specs, but if someone has some pointers to
provide me, I'd be gladfull.

Is there any workaround possible before some new verbs get added in a
future version of the VCD API ?

Thanks in advance for your help on this,

Guillaume.

Oleg Shaldibin

unread,

May 4, 2012, 1:47:19 PM5/4/12

to bosh...@cloudfoundry.org

It's OK not to have a direct mapping. Both methods have a simple contract:

1. both methods receive a tuple of (instance_id, disk_id) as an input;

2. If attach_disk succeeds, BOSH agent can read a mapping from disk_id to some infrastructure-specific disk tag (in case of VSphere it's SCSI-bus device number, in case of AWS it's a block device name;

3. If detach_disk succeeds, BOSH agent no longer has the mapping.

It's up to CPI to orchestrate the actual IaaS behavior to fulfill the contract and it's up to agent to know how to treat this new disk tag in the context of current infrastructure. In your example it could clone vApp behind the scenes, attach disk and destroy the old vApp, or something similar.

--
Best,

Oleg

Patrick Bozeman

unread,

May 4, 2012, 6:36:52 PM5/4/12

to bosh...@cloudfoundry.org

Fyi, there is a bosh-dev mailing list, which I added to the thread.

Guillaume Berche

unread,

May 8, 2012, 5:04:26 PM5/8/12

to bosh-dev

Sorry for lack of precision and posting to wrong group.

The VCD 1.5 API does not allow to move disks among VMs. Only a copy
(called "clone") of a VM along with its disks is supported, and the VM
needs to be in the powered off state for this operation to succeed.

This VCD 1.5 limitation seems to not mix well with bosh's attempt to
pool agent VM instances and give them successive personalities,
whereas my question on whether the bosh team was expecting to rely on
a future version of VCD api without limitation.

I understand bosh is using the attach_disk and detach_disk methods
when a vm gets its personality and starts acting as a job with one or
more persistent disks. The assumption is that the attach_disk and
detach_disk operations are fast, and allow for efficient pooling of
VMs, and externalizing job states into dynamically-attached disks.

Is there other uses of the attach_disk/detach_disk methods besides for
jobs with persistent disks? Are there cases where disks would be
attached in different groups, such as in the following sequence ?

1. create_vm (disk_locality={}): vm1
2. create_disk(10GB, vm_locality=vm1): disk1
3. attach_disk (vm1, disk1)
4. detach_disk(vm1, disk1)
5. create_disk(20GB, vm_locality=vm1): disk2
6. attach_disk (vm1, disk1)
7. attach_disk (vm1, disk2)
8. detach_disk(vm1, disk1)
9. detach_disk(vm1, disk2)
10. delete_vm(vm1)

What is the expected VM state after the attach_disk, detach_disk,
create_vm: can you confirm the guest OS (along with the bosh agent) is
expected to be running at that time?

I tried to imagine different scenarios to try to get close to the CPI
contract with a IaaS running the VCD 1.5.1 API:
A- use a NFS or iScsi appliance to dynamically create NAS or SAN
volumes, that can then be dynamically mounted into VMs. This appliance
could possibly be a virtual appliance instanciated on VCD or a
physical apppliance somewhere else.

B- try to emulate the CPI contract with the existing VCD 1.5.1
contract. This would basically work if the same disks are always
attached in the same groups.
- B1. emulate with a copy of the vm to perform disk copies on
"disk_detach" operation. This preserves the network attributes of a vm
when one of its disk is detached. This would fail if disks are
successively attached in different groups. This is scenario is
detailed below.
- B2. Simply power down a vm on any call to detach_disk. Then power
on the VM on any subsequent call to attach_disk with the same diskid.
This is scenario is detailed below. This approach would fail if:
1- disks are attached in different groups
2- Health manager would detect the powered off VM as inexistent
and will try to recreate another one. Seems unhealthy, what do you
think ?

Scenario A seems the easier to implement but requires either
additional storage hardware outside of VCD, or a virtual appliance
exposing NFS/iSCSI as a vApp.
Any feedback on performance penalty of such w.r.t. to native disk
devices in VCD ?

Scenario B1, is more complicated, and seems to defeat the polling
mechanism. Scenario B2 might break assumption that a VM is expected to
be running after a detachdisk operation. Any other ideas that would
allow to test bosh on a vCloud Director 1.5 IaaS ?

Below is details about scenarios B1 and B2.

Guillaume.

------------------

Detailed scenario B1 with one vapp holding a single bosh agent vm,
called "vm1" with two disks: disk0:(holding the stemcell), and disk1:
previously attached)
- detachdisk(vm1, disk1):
1. power off the vapp (to be able to make a copy of it)
2. use cloneVapp https://www.vmware.com/support/vcd/doc/rest-api-doc-1.5-html/operations/POST-CloneVApp.html
to save the disk1 in a persistent storage (i.e. another VM). Let's
name the copy of vApp is "disk1_copy_vm"
3. remove the disk1 from it using
https://www.vmware.com/support/vcd/doc/rest-api-doc-1.5-html/operations/PUT-Disks.html
4. power on the vapp

- create_vm(agent_id, stem_cellbosh-agent, resource_pool, networks):
if disk_locality=nil
1. instanciate a vapp from the stem_cell_bosh_agent vApp template
(single vm)
2. power on the vapp
3. return id (lets call it "vm2")

if disk_locality!=nil:
don't do anything, wait for the attach_disk call(s) to fire

- attachdisk (vm2, disk1):
1. instanciate a vapp from the vapp "disk1_copy_vm".
2. power on the vm.

- attachdisk (vm2, disk2): assuming disk2 was another disk previously
created and saved into "disk2_copy_vm"
If disk2 was previously detached from a vm that was not holding
disk 1, then the only way we have to copy the disk content to VM2
would be:
1. stop vm2.
2. create a new empty disk21 into vm2 using
https://www.vmware.com/support/vcd/doc/rest-api-doc-1.5-html/operations/PUT-Disks.html
3. start vm2
4. start "disk2_copy_vm"
5. using guest os collaboration, copy the content of
disk2.copy_vm.disk2 into vm2.disk2. This implies transferring every
bytes through the network across the two vms, and handling the raw
device accesses.

Detailed scenario B2 of creating vm1 for job1 requiring disk1 and
disk2, and then moving disk1 and disk2 to vm2

//vm1 created for running job1 requiring disk1 and disk2
- create_vm(agent_id, stem_cellbosh-agent, resource_pool, networks):
disk_locality=nil
1. instanciate a vapp from the stem_cell_bosh_agent vApp template
(single vm)
2. power on the vapp
3. return id (lets call it "vm1")

-create_disk(10GB, vm1)
don't do anything yet, wait for the attach_disk call(s) to fire
return "disk1"

-create_disk(20GB, vm1)
don't do anything yet, wait for the attach_disk call(s) to fire
return "disk2"

- attachdisk (vm1, disk1):
1. instanciate a vapp from the stem_cell_bosh_agent vApp template
(single vm)
2. modify hardware to add another disk
3. power off/on the VM to have it take effect

- attachdisk (vm1, disk2):
2. modify hardware to add another disk
3. power off/on the VM to have it take effect

- detachdisk(vm1, disk1):
power off vm1

- detachdisk(vm1, disk2):
don't do anything

//Health manager detects VM1 as not responding. Would the
ProblemsHandler ask to delete it?

//A second vm is later requested to run job1 and then to attach disk1
and disk2

- create_vm(agent_id, stem_cellbosh-agent, resource_pool, networks):
vm2
disk_locality=disk1,disk2:
don't do anything, wait for the attach_disk call(s) to fire
return "vm2"

- attachdisk (vm2, disk1):
start vm1

- attachdisk (vm2, disk1):
don't do anything

On May 4, 7:47 pm, Oleg Shaldibin <ol...@rbcon.com> wrote:
> It's OK not to have a direct mapping. Both methods have a simple contract:
>
> 1. both methods receive a tuple of (instance_id, disk_id) as an input;
> 2. If attach_disk succeeds, BOSH agent can read a mapping from disk_id to
> some infrastructure-specific disk tag (in case of VSphere it's SCSI-bus
> device number, in case of AWS it's a block device name;
> 3. If detach_disk succeeds, BOSH agent no longer has the mapping.
>
> It's up to CPI to orchestrate the actual IaaS behavior to fulfill the
> contract and it's up to agent to know how to treat this new disk tag in the
> context of current infrastructure. In your example it could clone vApp
> behind the scenes, attach disk and destroy the old vApp, or something
> similar.
>
>
>
>
>
>
>
>
>
> On Fri, May 4, 2012 at 9:56 AM, skaar <sk...@vmware.com> wrote:
> > Forwarding this to bosh-dev
>
> > /skaar
>
> > ---------- Forwarded message ----------
> > From: Guillaume Berche <berc...@gmail.com>
> > Date: Fri, May 4, 2012 at 9:54 AM
> > Subject: [vcap-dev] CPI impl for VCloud: attachdisk, detachdisk, how?
> > To: vcap-dev <vcap-...@cloudfoundry.org>
>
> > Hello,
>
> > I've read into

> >http://www.slideshare.net/marklucovsky/cloud-foundry-anniversary-tech...

Jagannath Krishnan

unread,

May 10, 2012, 12:57:01 PM5/10/12

to bosh-dev

Thanks for your email, Guillaume. We're working on a BOSH CPI for a
future
version of vCloud Director. Once it is available publicly we will also
make the CPI implementation available.

Thanks,
Jagannath Krishnan
Staff Product Manager,
vCloud Director Team, VMware Inc.

Guillaume Berche

unread,

May 11, 2012, 5:42:08 PM5/11/12

to bosh-dev

Thanks for your reply Jagannath. Any hint as when new vCD version and
CPI impl could be available ?

Thanks again Oleg for the overview of the agent integration, this was
very useful.

Guillaume.

Jagannath Krishnan

unread,

May 21, 2012, 10:26:58 AM5/21/12

to bosh-dev

Hi Guillaume,
We're actively working on this. Unfortunately, I can't disclose
details of our VCD releases publicly.

Thanks,
Jagannath

On May 11, 5:42 pm, Guillaume Berche <guillaume.ber...@orange.com>
wrote:

Reply all

Reply to author

Forward