Bosh aws deployment - unable to alter root disk size

MT

unread,

Jan 30, 2014, 12:30:02 AM1/30/14

to bosh-...@cloudfoundry.org

Hi Bosh mailing list,

First, although it's not overly relevant to my problem, some background on what I'm doing and my progress so far. I'm attempting to use a combination of Bosh and Chef together to automate a cloud deployment on top of AWS. This is actually a stop gap until Docker matures and then I'll probably transition to using Bosh to deploy VM's and setup my network, Docker to encapsulate services (running as a Bosh job), and Chef to install stuff into the Docker images. Why a combination of all 3 when each of them is competing for the same space? After diving into each of them I think they can all take on complimentary roles in automating deployments (that play to their strengths). While I agree in principle that installing everything from source is the most robust and least error prone solution, in practice as a developer I just want a clean way to both quickly bring up complete environments and then to easily update my code base across those environments (hopefully easily supporting local install along the way). If anyone is curious I'm happy to go into this further, but for now I just thought I'd mention it as it's probably a slightly non-standard use case.

Anyways, progress so far: I've managed to build my own stemcell, upgrade it to Ubuntu 12.04.3 with a 3.8 kernel (so it can support Docker), and add in a Chef stage to install Chef and Berkshelf into the stemcell image. The stemcell uploads without issue and I can deploy jobs onto it. I am currently branched off the 1798 release, I'm happy to send a patch if you'd like to see my change set. As an aside, I ran into lots of issues trying to get a 13.XX stemcell working, so I stuck with upgrading the 12.04.3 kernel.

Next, I have a micro-bosh deployed on top of AWS and I can successfully use it to deploy jobs. I've created several test jobs and packages, the one I'm currently focusing on is an apt repository which Chef will pull from. My jobs use a wrapper service that monit invokes which registers the box with Chef / does a sync / de-registers with Chef on server shutdown, and everything seems to work fairly well (For the apt box monit is monitoring nginx, which is serving up the repository).

The issue I'm running into is that no matter how I change my manifest, the root file system is always coming up as 2 gigs. The nginx install thus fails to due lack of space on the drive. I've tried altering both the resource_pools and jobs sections of the manifest, however Bosh just seems to ignore the configuration when creating the root ebs volume. See below for an excerpt from my apt-repository.yml manifest:

resource_pools:
- name: common
network: default
size: 1
stemcell:
    name: bosh-aws-xen-ubuntu
    version: 1798
cloud_properties:
    instance_type: m1.medium
    disk: 8192

jobs:
- name: apt
template:
    - apt
instances: 1
resource_pool: common
networks:
- name: default
    default:
    - dns
    - gateway
cloud_properties:
    instance_type: m1.medium
    disk: 8192

I've also tried updating the manifest of the micro-bosh instance and redeploying (--update) it to see if it would re-size the disk, however that did not work either (it re-attached the same ebs volume, maybe I need to completely wipe out the micro bosh instance?). I then tried altering the default disk size and rebuilding my stemcell, with the end result of bosh upload stemcell error-ing out with:

E, [2014-01-30T03:37:35.024803 #4961] [task:256] ERROR -- : Unable to copy stemcell root image: command 'sudo -n /var/vcap/jobs/director/bin/stemcell-copy /var/vcap/data/tmp/director/stemcell20140130-4961-fv3nch/image /dev/xvdg 2>&1' failed with exit code 1

From the log it looks as though the volume gets created with the following properties:

I, [2014-01-30T03:35:05.798118 #4961] [task:256] INFO -- : Found stemcell image `bosh-aws-xen-ubuntu/1798', cloud properties are {"name"=>"bosh-aws-xen-ubuntu", "version"=>"1798", "infrastructure"=>"aws", "architecture"=>"x86_64", "root_device_name"=>"/dev/sda1"}

So it appears as though the copy failed due to the stemcell image being larger then the underlying volume (which makes me think this is the wrong direction and I should be hooking into another place in the deployment process that occurs before the stemcell gets copied, ie disk should be listed as one of the cloud properties above...).

I'm going to continue working through the ruby code to try and understand exactly where in the deployment process the volume is being created, however any help pointing me in the right direction would be greatly appreciated. Another option is change all my cookbooks to install into a chroot'd environment on top of /var/vcap/store (created using persistent_disk, which does work), perhaps this is a better direction from a Bosh standpoint (although and I don't look forward to having to rework my cookbooks...)?

Lastly, many thanks for open sourcing Bosh, I think it has the potential to be a very powerful tool. My only concern is that I feel some of the decisions are more in the direction of locking the user into the Bosh use case then creating a pluggable framework, but it's very likely I have that perception due to my lack of understanding of the project (or the fact that I'm a developer moonlighting as a dev-ops engineer...).

Some debug output to help the cause:

mt@mt:~/workspace/bosh/releases/apt-repository$ bosh status
Config
             /home/mt/.bosh_config

Director
Name       aws
URL        https://XX.XX.XX.XX:25555
Version    1.5.0.pre.912 (release:70118c29 bosh:70118c29)
User       admin
UUID       XXXXXXXXXXXXXXXXXXXXXXX
CPI        aws
dns        enabled (domain_name: microbosh)
compiled_package_cache disabled
snapshots disabled

Deployment
Manifest   /home/mt/workspace/bosh/releases/apt-repository/apt-repository.yml

Release
dev        apt/0.2-dev
final      n/a

mt@mt:~/workspace/bosh/releases$ bosh stemcells

+---------------------+---------+--------------+
| Name                | Version | CID          |
+---------------------+---------+--------------+
| bosh-aws-xen-ubuntu | 1782    | ami-XXXXXXX |
| bosh-aws-xen-ubuntu | 1798    | ami-XXXXXXX |
+---------------------+---------+--------------+

(*) Currently in-use

Stemcells total: 2

Best,

- MT

Matthew Boedicker

unread,

Jan 30, 2014, 1:24:02 AM1/30/14

to bosh-...@cloudfoundry.org

Hi MT,

Normally the root partition of the stemcell is not resized during deployment.

Bosh jobs and packages from a bosh release will go into /var/vcap/data (the ephemeral disk) when deployed.

Jobs with persistent disks will have a persistent disk mounted at /var/vcap/store. This can be specified per-job in the deployment manifest as the persistent_disk property.

It sounds like you are doing something a little different, but the typical way is to write create a bosh release for your applications and configure your applications to put their persistent data in /var/vcap/store.

Matt

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.

Ferran Rodenas

unread,

Jan 30, 2014, 1:30:01 AM1/30/14

to bosh-...@cloudfoundry.org

I found your use case very interesting, the combination of Bosh + (Docker|Warden) is something that I've been thinking about recently and I'll want to explore deeply. But for now, discussions aside about how Bosh works (we can start another thread on this), to solve your problem you'll need to hack the stemcell builder process if you don't want to modify your existing cookbooks.

By default, the AWS CPI is going to create a 2Gb root volume [1]. To override that size, there should be a 'disk' property at the stemcell properties, that is read from the stemcell file (the 'disk' property you used at your manifest is ignored by the CPI). Right now, the only way to override this is to modify the AWS stemcell builder invocation [2] to set the root size of the image and the 'stemcell' stage [3] to add a property like:

manifest = {

"name" => stemcell_name,

"version" => version,

"bosh_protocol" => bosh_protocol,

"sha1" => stemcell_checksum,

"cloud_properties" => {

"name" => stemcell_name,

"version" => version,

"infrastructure" => stemcell_infrastructure,

"architecture" => "x86_64",

"root_device_name" => "/dev/sda1",

"disk" => desire_size_in_mb

}

Before hacking the stemcell builder, and just to check if this works, you can use your preexisting stemcell (ie bosh-stemcell-1868-aws-xen-ubuntu.tgz), untar it, modify the stemcell.MF file with the above 'disk' property, and tar the stemcell again. After you upload the new stemcell, you should see that the root disk size is what you set at the stemcell.MF 'disk' property. Be careful when doing this, as you should use your altered stemcell (the one you increased the image size), if not, you'll get a bigger root disk, but the root partition will still be only 2Gb.

Hope this helps.

[1] https://github.com/cloudfoundry/bosh/blob/master/bosh_aws_cpi/lib/cloud/aws/cloud.rb#L395

[2] https://github.com/cloudfoundry/bosh/blob/master/bosh-stemcell/lib/bosh/stemcell/infrastructure.rb#L52

[2] https://github.com/cloudfoundry/bosh/blob/master/stemcell_builder/stages/stemcell/apply.sh#L30

- Ferdy

2014-01-29 MT <mgtay...@gmail.com>:

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.

Mike Taylor

unread,

Jan 30, 2014, 9:17:12 PM1/30/14

to bosh-...@cloudfoundry.org

Matthew, Ferran,

Thank you both very much for the quick replies and detailed information. I was missing step #3 from Ferran's post, after editing the manifest the volume is properly re-sized and everything comes up correctly.

I do have one follow up question based on Matthew's explanation of Bosh. Does Bosh currently (or plan to) support mounting multiple persistent volumes to a VM? My use case is deploying something like Kafka (or Zookeeper), where the recommendation is to place the data and applications logs onto separate disks to avoid disk contention. As a follow up question to this, does Bosh have the concept of specifying different type of persistent disks (ie, provisioned IOPs)?

Best,

- MT

Ferran Rodenas

unread,

Jan 30, 2014, 10:24:01 PM1/30/14

to bosh-...@cloudfoundry.org

Glad to hear it worked.

Supporting multiple persistent disks and/or specifying different types of disks is something that has been discussed several times, but AFAIK there isn't any short-term plans to support them, at least, I don't see any related story at the Bosh tracker: https://www.pivotaltracker.com/s/projects/956238.

- Ferdy

2014-01-30 Mike Taylor <mgtay...@gmail.com>:

Dr Nic Williams

unread,

Jan 30, 2014, 10:58:50 PM1/30/14

to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org

MT, the /var/vcap/data is mounted on the ephemeral disk for your flavor (where logs are commonly stored by many bosh releases such as cf-release); and is a different disk from the persistent disk at /var/vcap/store (which is an EBS volume on AWS). Helpful?

Reply all

Reply to author

Forward