Forcing Openstack_CPI to use metadata only (no config_drive)

93 views
Skip to first unread message

Matt Johnson

unread,
Apr 24, 2014, 2:56:47 PM4/24/14
to bosh...@cloudfoundry.org
Hi all,

The following is a quick writeup of an issue i've been having (part provider, part documentation, part my config) regarding getting bootstrap info for bosh-agent into a VM.

It's here because i'm wondering if i've missed something which everyone else already knows, or, if indeed I should get some pull requests cooked up to make metadata OR config_drive (instead of both) a reality.

It's also here to help someone incase they are in the same boat.

(Originally in markup, so this could be ugly!)

# MicroBOSH / BOSH-CLI Registry

The part the 'registry' component plays in the booting of VM's on Openstack.

### What is the Registry?

It's a Small HTTP server listening on your 'BOSH', this means;

If you're installing microbosh from the bosh-micro-cli on a cloudhammer:
  • It's going to run on cloudhammer when you do a 'micro bosh deploy <stemcell> 
If you're installing a BOSH or another Application from a MicroBOSH
  • It's going to run on the MicroBOSH VM 
If you're installing another Application from a full BOSH
  • It will be running on one of your BOSH VM's, depending on your BOSH deployment 
The purpose of the Registry (from what I can see) is to provide the VM all the information it needs for the bosh-agent to configure the machine.

items such as;

 - Packages
 - Blobs
 - Monit configuration
 - etc etc.
 
Previously it was thought this was all done via the Openstack metadata service (or the config drive if the metadata service is unavailable).

However...

### Stemcell boot and getting info into the VM

When Bosh-CLI/MicroBOSH/BOSH (xBOSH) boots a stemcell throuh the openstack_cpi, the VM gets its identity the following way **(based on my debugging)**

- The nova API call 'xBOSH' makes includes some user-data/MetaData. This includes:
a. A 'Registry Endpoint' URL. Defaulting to http://admin:admin@localhost:25695
b. The Hostname of the VM

- The nova API call 'xBOSH' makes includes some more data in the 'Parameters' field of the call. 
This is openstacks 'config_drive' option, for passing data to the VM as a mounted CD/HDD. You shouldnt need both, metadata is meant to REPLACE config_drive, however the Openstack_CPI seems to set both. The SSH key from OpenStack is only set in the Parameters, even though it could go in metadata/either. 


- The bosh-agent within the VM Run's and then:

a. Tries to retreive information from the MetaData service.

- Armed with the metadata info, the bosh-agent then expects one of two things:

a. To use the Registry Endpoint URL to connect to the Registry service on xBOSH and pull down more information to configure the VM to it's final state (Applications, Monit, whatever the BOSH release you're deploying to the VM wants).
b. *or* to get exactly the same data from a .json file in a known location, placed there through the config_drive.
### Why Config_drive (using parameters in the Openstack API call) is bad

Config drives are the users data (usually json) from the 'Parameters' section of the Openstack API call, put into a file and presented on a fake HDD/CD mount within the VM at boot time.

The problem is, the tiny HDD/CD's are stored locally on the hypervisor that boots the machine. This means that even if your VM is booted from CEPH or some other shared storage solution which provides mobility for your VM, the config_drive will pin that VM to the hypervisor node you're on.

Some bad scenarios:

- Cannot migrate VM's to other nodes for maintenance
- Cannot bring up VM's on another node in a node failure scenario
- Cannot balance/live migrate VM's for performance


### Realising you're still using config drive is tricky.

We were not aware, once we had the metadata service, that we were still using config_drive for so much.

It was only inspecting the API calls down through BOSH > Openstack_CPI > Fog > Excon that we saw the 'Parameters' being set. A base64 decode of anything in the 'Parameters' or 'user-data' fields shows they are much-the-same (with Parameters also having the ssh key as discussed above).

Disableing the parameters requires a code change in the Openstack_CPI:
    
    [hammer@cloudhammer ~]$ cd /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_openstack_cpi-1.2341.0/lib/cloud/openstack/
[hammer@cloudhammer openstack]$ diff -u cloud.rb.bak cloud.rb
--- cloud.rb.bak  2014-04-24 13:31:11.892000105 +0100
+++ cloud.rb 2014-04-24 13:31:48.538997495 +0100
@@ -228,11 +228,7 @@
          :key_name => keypair.name,
          :security_groups => security_groups,
          :nics => nics,
-          :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec)),
-          :personality => [{
-                            "path" => "#{BOSH_APP_DIR}/user_data.json",
-                            "contents" => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
-                          }]
+          :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
        }
        
This disables the use of config drive as we no longer send 'Personality' in the API call to Openstack. You'll notice we are also now sending the public_key in the metadata service, where as before we were not. This appears to work fine.

However, this is where you find out if you were actually using the registry or not. In our case, we were not.
The code change above caused a VM creation to always hang, while /var/vcap/bosh/log/current showed the bosh-agent failing repeatedly trying to get data from a default registry URL. (http://admin:admin@localhost:25695).

Seem's we'd been using the config_drive all this time.

### Setting the registry URL correctly for new BOSH VM's

I only have experience here with getting the bosh micro cli plugin gems to create a MicroBOSH (as thats what I was doing when my provider turned off config_drive ;) ).

it is possible that once a MicroBOSH/BOSH is up and running, they are setting themselves as the registry URL correctly.

However, for the micro_bosh.yml bootstrap file, this is the needed magic:

cloud:
 plugin: openstack
 properties:
   openstack:
     auth_url: https://API:5000/v2.0
     username: userfoo
     api_key: passwordbar
     tenant: Tenant
     default_security_groups:
     - Sec-Group 
     default_key_name: Pub-key-stack-name
     private_key: priv-key-path
   registry:
     endpoint: http://admin:admin@<IP OF THE MACHINE YOU'RE RUNNING BOSH MICRO DEPLOY ON>:25695
     user: admin
     password: admin


If you were digging around in the bosh source and found this example:


it tells you to do this:

openstack:
 username: foo
 api_key: bar
 tenant: bosh
 region:
 endpoint_type: publicURL
 default_key_name: default
 default_security_groups: ["default"]
registry:
 user: admin
 password: admin

you'll get the following error, as the code now tries to split a username and password from **user:pass@** before the URL with no error handleing if they are not there.

[hammer@cloudhammer deployments]$ bosh micro deployment tx-col-dev-2/
/usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/registry.rb:14:in `initialize': undefined method `split' for nil:NilClass (NoMethodError)
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager/openstack.rb:14:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager/openstack.rb:14:in `initialize'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:50:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:50:in `initialize'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:43:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:43:in `create'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:327:in `deployer'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:55:in `set_current'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:32:in `micro_deployment'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/command_handler.rb:57:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/runner.rb:56:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/runner.rb:16:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/bin/bosh:7:in `<top (required)>'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/bosh:23:in `load'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/bosh:23:in `<main>'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/ruby_executable_hooks:15:in `eval'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/ruby_executable_hooks:15:in `<main>'


### Maybe im stupid...

Maybe this is well known by everyone else deploying MicroBOSH / BOSH etc on Openstack.. 

However, I found no useful information searching around these topics, no examples that got me to the solution above and no true description as to the process of how a bosh-agent tries to get it's data.

Whatsmore, the documentation on the Openstack_CPI: [https://github.com/cloudfoundry/bosh/blob/master/bosh_openstack_cpi/USAGE.md]() and the MicroBOSH guide: [http://docs.cloudfoundry.org/deploying/openstack/deploying_microbosh.html]() for openstack give no mention of these options.

It seems to me everyone is just using config_drive, maybe accidentally? Admittedly it's only when config_drive was taken away from me that the issue came to light.

I'm hoping that someone else is scratching their head as to why their VM's arent portable, or how they stop using the config_drive, and this saves them a bit of time.






Ferran Rodenas

unread,
Apr 24, 2014, 4:31:59 PM4/24/14
to bosh...@cloudfoundry.org
Matt, some clarifications inline:

2014-04-24 11:56 GMT-07:00 Matt Johnson <m...@matt-j.co.uk>:
Hi all,

The following is a quick writeup of an issue i've been having (part provider, part documentation, part my config) regarding getting bootstrap info for bosh-agent into a VM.

It's here because i'm wondering if i've missed something which everyone else already knows, or, if indeed I should get some pull requests cooked up to make metadata OR config_drive (instead of both) a reality.

It's also here to help someone incase they are in the same boat.

(Originally in markup, so this could be ugly!)

# MicroBOSH / BOSH-CLI Registry

The part the 'registry' component plays in the booting of VM's on Openstack.

### What is the Registry?

It's a Small HTTP server listening on your 'BOSH', this means;

If you're installing microbosh from the bosh-micro-cli on a cloudhammer:
  • It's going to run on cloudhammer when you do a 'micro bosh deploy <stemcell> 
If you're installing a BOSH or another Application from a MicroBOSH
  • It's going to run on the MicroBOSH VM 
If you're installing another Application from a full BOSH
  • It will be running on one of your BOSH VM's, depending on your BOSH deployment 
The purpose of the Registry (from what I can see) is to provide the VM all the information it needs for the bosh-agent to configure the machine.

items such as;

 - Packages
 - Blobs
 - Monit configuration
 - etc etc.
 

No, it only provides info about NATS endpoint and credentials, blobstore endpoint and credentials, agent id, networking, disk and some environment variables. Packages, blobs, monit, ... info is provided using NATS when the job is deployed into the vm (not when the vm is created).
 
Previously it was thought this was all done via the Openstack metadata service (or the config drive if the metadata service is unavailable).


The reason for the registry is security concerns. Some users are afraid that we provide credentials via the metadata, as an operator can gain access to the dashboard and look up for these credentials. Instead, we provide the credentials via the registry, who has a security mechanism that compares the ip address of the vm requesting info with the ip addresses stored at the registry [1]. This behavior has been discussed by the BOSH team several times, it's really useful? it's secure? AFAIK there isn't a consensus here yet.
 

However...

### Stemcell boot and getting info into the VM

When Bosh-CLI/MicroBOSH/BOSH (xBOSH) boots a stemcell throuh the openstack_cpi, the VM gets its identity the following way **(based on my debugging)**

- The nova API call 'xBOSH' makes includes some user-data/MetaData. This includes:
a. A 'Registry Endpoint' URL. Defaulting to http://admin:admin@localhost:25695
b. The Hostname of the VM

- The nova API call 'xBOSH' makes includes some more data in the 'Parameters' field of the call. 
This is openstacks 'config_drive' option, for passing data to the VM as a mounted CD/HDD. You shouldnt need both, metadata is meant to REPLACE config_drive, however the Openstack_CPI seems to set both. The SSH key from OpenStack is only set in the Parameters, even though it could go in metadata/either. 



No, it's not the config drive, it's what in OpenStack is called 'server personality' or 'user-data injected files' [2]. The behavior is different from the config-drive [3]. The way it works is mounting the image file at the compute node, injecting the files into the filesystem of the 1st partition of the mounted file, demounting the image file, and then booting the modified image. This has several drawbacks: 1) although OpenStack has some security mechanism to be sure that you don't inject files on a wrong location, it is still a potential security hole; 2) if you use libguestfs on the compute node, the injecting mechanism will fail with the Bosh stemcells, as libguestfs is unable to mount the stemcell partitions (several users have reported problems in this mailing list when using Bosh+OpenStack+libguestfs); 3) it disables HA/DR when using an external backend for vm boot disks.

The metadata service is not meant to replace config drive or server personalities. In fact, it was the first mechanism in OpenStack to send arbitrary data to the vm. Server personalities was added later, and config drive arrived 1 1/2 years ago.

We still need both (or 3) mechanisms. Not all operators are confortable with a metadata server (it can be also a security hole is someone tries to fake the mac address). For example, Rackspace doesn't use the metadata server in their public offering. This is also why the 'cloud-init' package has the same behavior, it tries several mechanism to retrieve the user data [4] (BTW, we're not using cloud-init in BOSH because it doesn't support all platforms).

Regarding the SSH key, is you're using the metadata server, then it is automatically injected into the metadata by OpenStack, so you don't need to send it via the user-data. But if you're using server personalities or config-drive, then you need to send manually  the SSH key when injecting the user data.

 
- The bosh-agent within the VM Run's and then:

a. Tries to retreive information from the MetaData service.

- Armed with the metadata info, the bosh-agent then expects one of two things:

a. To use the Registry Endpoint URL to connect to the Registry service on xBOSH and pull down more information to configure the VM to it's final state (Applications, Monit, whatever the BOSH release you're deploying to the VM wants).
b. *or* to get exactly the same data from a .json file in a known location, placed there through the config_drive.

No, first it tries to retrieve info from the metadata service, if it fails, then it tries to get the info from the json file. But both of them have the same info, the address of the registry.
 
### Why Config_drive (using parameters in the Openstack API call) is bad

Config drives are the users data (usually json) from the 'Parameters' section of the Openstack API call, put into a file and presented on a fake HDD/CD mount within the VM at boot time.

The problem is, the tiny HDD/CD's are stored locally on the hypervisor that boots the machine. This means that even if your VM is booted from CEPH or some other shared storage solution which provides mobility for your VM, the config_drive will pin that VM to the hypervisor node you're on.

Some bad scenarios:

- Cannot migrate VM's to other nodes for maintenance
- Cannot bring up VM's on another node in a node failure scenario
- Cannot balance/live migrate VM's for performance



Agreed if we're talking about the server personalities. But I'm not really sure for config-drive. I believe that config-drive is an in-memory storage, that is exposed to the vm and you need to explicitly mount it if you want to access to the contents. But I'm not really sure, it has been a while since I played for the last time with config-drive.
 
### Realising you're still using config drive is tricky.

We were not aware, once we had the metadata service, that we were still using config_drive for so much.

It was only inspecting the API calls down through BOSH > Openstack_CPI > Fog > Excon that we saw the 'Parameters' being set. A base64 decode of anything in the 'Parameters' or 'user-data' fields shows they are much-the-same (with Parameters also having the ssh key as discussed above).

Disableing the parameters requires a code change in the Openstack_CPI:
    
    [hammer@cloudhammer ~]$ cd /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_openstack_cpi-1.2341.0/lib/cloud/openstack/
[hammer@cloudhammer openstack]$ diff -u cloud.rb.bak cloud.rb
--- cloud.rb.bak  2014-04-24 13:31:11.892000105 +0100
+++ cloud.rb 2014-04-24 13:31:48.538997495 +0100
@@ -228,11 +228,7 @@
          :key_name => keypair.name,
          :security_groups => security_groups,
          :nics => nics,
-          :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec)),
-          :personality => [{
-                            "path" => "#{BOSH_APP_DIR}/user_data.json",
-                            "contents" => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
-                          }]
+          :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
        }
        
This disables the use of config drive as we no longer send 'Personality' in the API call to Openstack. You'll notice we are also now sending the public_key in the metadata service, where as before we were not. This appears to work fine.


As I told before, you don't need to send the SSH key at the user-data if using the metadata server. OpenStack is doing this automatically for you.
 
However, this is where you find out if you were actually using the registry or not. In our case, we were not.
The code change above caused a VM creation to always hang, while /var/vcap/bosh/log/current showed the bosh-agent failing repeatedly trying to get data from a default registry URL. (http://admin:admin@localhost:25695).

Seem's we'd been using the config_drive all this time.


If your provider allows metadata, it's the first approach the Bosh agent will use. It will only fallback to the server personality is the metadata server fails.
There should be a problem with that version of the micro deployer gem. You don't need to set the IP address of the local machine where you're running bosh micro deploy. Micro deployer is going to do all the work for you. It will setup a ssh tunnel between your local machine and the microBosh agent, and it's going to discover the IP address of the microBOSH vm. Can you please try with a newer version of the gem?
 
### Maybe im stupid...


Nobody is stupid here, and I'm really glad these kind of questions come to the mailing list. This will help others will similar problems/concerns/inquietudes/... , and denotes a problem with the BOSH documentation.
 
Maybe this is well known by everyone else deploying MicroBOSH / BOSH etc on Openstack.. 

However, I found no useful information searching around these topics, no examples that got me to the solution above and no true description as to the process of how a bosh-agent tries to get it's data.


There is some documentation at the bosh-agent [5]. We discussed this topic with DrNic last year, and he creates some beautiful videos that explains how the bosh-agent works: [6] [7] [8]

 
Whatsmore, the documentation on the Openstack_CPI: [https://github.com/cloudfoundry/bosh/blob/master/bosh_openstack_cpi/USAGE.md]() and the MicroBOSH guide: [http://docs.cloudfoundry.org/deploying/openstack/deploying_microbosh.html]() for openstack give no mention of these options.

It seems to me everyone is just using config_drive, maybe accidentally? Admittedly it's only when config_drive was taken away from me that the issue came to light.


I'm still not sure how this problem come to you, as by default the bosh-agent is using the metadata server, so you shouldn't notice anything if your operator disabled the server personalities, unless they disabled the metadata server.
 
I'm hoping that someone else is scratching their head as to why their VM's arent portable, or how they stop using the config_drive, and this saves them a bit of time.

 
I'm completely in favor of disabling the server personalities, it has been a headache for lots of people. Some time ago I submitted a PR to allow also config-drive [9] (as a fallback mechanism to metadata and server personalities). Last week we discussed with the BOSH team not to merge that PR, because what we want to do is to disable the server personalities and enable the config drive as the only fallback mechanism. The story is on the public tracker: [10]


Hope this clarifies the role of BOSH registry and agent, and the different OpenStack mechanisms to send arbitrary data to the vm. Let me know if there's something that is still not clear.

- Ferdy

Dr Nic Williams

unread,
Apr 29, 2014, 12:58:32 AM4/29/14
to bosh...@cloudfoundry.org
Thanks Matt & Ferdy for this Q&A happening in the mailing list!
Reply all
Reply to author
Forward
0 new messages