The following is a quick writeup of an issue i've been having (part provider, part documentation, part my config) regarding getting bootstrap info for bosh-agent into a VM.
It's here because i'm wondering if i've missed something which everyone else already knows, or, if indeed I should get some pull requests cooked up to make metadata OR config_drive (instead of both) a reality.
It's also here to help someone incase they are in the same boat.
# MicroBOSH / BOSH-CLI Registry
The part the 'registry' component plays in the booting of VM's on Openstack.
### What is the Registry?
It's a Small HTTP server listening on your 'BOSH', this means;
If you're installing microbosh from the bosh-micro-cli on a cloudhammer:
- It's going to run on cloudhammer when you do a 'micro bosh deploy <stemcell>
If you're installing a BOSH or another Application from a MicroBOSH
- It's going to run on the MicroBOSH VM
If you're installing another Application from a full BOSH
- It will be running on one of your BOSH VM's, depending on your BOSH deployment
The purpose of the Registry (from what I can see) is to provide the VM all the information it needs for the bosh-agent to configure the machine.
items such as;
- Packages
- Blobs
- Monit configuration
- etc etc.
Previously it was thought this was all done via the Openstack metadata service (or the config drive if the metadata service is unavailable).
However...
### Stemcell boot and getting info into the VM
When Bosh-CLI/MicroBOSH/BOSH (xBOSH) boots a stemcell throuh the openstack_cpi, the VM gets its identity the following way **(based on my debugging)**
- The nova API call 'xBOSH' makes includes some user-data/MetaData. This includes:
a. A 'Registry Endpoint' URL. Defaulting to http://admin:admin@localhost:25695
b. The Hostname of the VM
- The nova API call 'xBOSH' makes includes some more data in the 'Parameters' field of the call.
This is openstacks 'config_drive' option, for passing data to the VM as a mounted CD/HDD. You shouldnt need both, metadata is meant to REPLACE config_drive, however the Openstack_CPI seems to set both. The SSH key from OpenStack is only set in the Parameters, even though it could go in metadata/either.
- The bosh-agent within the VM Run's and then:
a. Tries to retreive information from the MetaData service.
- Armed with the metadata info, the bosh-agent then expects one of two things:
a. To use the Registry Endpoint URL to connect to the Registry service on xBOSH and pull down more information to configure the VM to it's final state (Applications, Monit, whatever the BOSH release you're deploying to the VM wants).
b. *or* to get exactly the same data from a .json file in a known location, placed there through the config_drive.
### Why Config_drive (using parameters in the Openstack API call) is bad
Config drives are the users data (usually json) from the 'Parameters' section of the Openstack API call, put into a file and presented on a fake HDD/CD mount within the VM at boot time.
The problem is, the tiny HDD/CD's are stored locally on the hypervisor that boots the machine. This means that even if your VM is booted from CEPH or some other shared storage solution which provides mobility for your VM, the config_drive will pin that VM to the hypervisor node you're on.
Some bad scenarios:
- Cannot migrate VM's to other nodes for maintenance
- Cannot bring up VM's on another node in a node failure scenario
- Cannot balance/live migrate VM's for performance
### Realising you're still using config drive is tricky.
We were not aware, once we had the metadata service, that we were still using config_drive for so much.
It was only inspecting the API calls down through BOSH > Openstack_CPI > Fog > Excon that we saw the 'Parameters' being set. A base64 decode of anything in the 'Parameters' or 'user-data' fields shows they are much-the-same (with Parameters also having the ssh key as discussed above).
Disableing the parameters requires a code change in the Openstack_CPI:
[hammer@cloudhammer ~]$ cd /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_openstack_cpi-1.2341.0/lib/cloud/openstack/
[hammer@cloudhammer openstack]$ diff -u cloud.rb.bak cloud.rb
--- cloud.rb.bak 2014-04-24 13:31:11.892000105 +0100
+++ cloud.rb 2014-04-24 13:31:48.538997495 +0100
@@ -228,11 +228,7 @@
:security_groups => security_groups,
:nics => nics,
- :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec)),
- :personality => [{
- "path" => "#{BOSH_APP_DIR}/user_data.json",
- "contents" => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
- }]
+ :user_data => Yajl::Encoder.encode(user_data(server_name, network_spec, keypair.public_key))
}
This disables the use of config drive as we no longer send 'Personality' in the API call to Openstack. You'll notice we are also now sending the public_key in the metadata service, where as before we were not. This appears to work fine.
However, this is where you find out if you were actually using the registry or not. In our case, we were not.
The code change above caused a VM creation to always hang, while /var/vcap/bosh/log/current showed the bosh-agent failing repeatedly trying to get data from a default registry URL. (http://admin:admin@localhost:25695).
Seem's we'd been using the config_drive all this time.
### Setting the registry URL correctly for new BOSH VM's
I only have experience here with getting the bosh micro cli plugin gems to create a MicroBOSH (as thats what I was doing when my provider turned off config_drive ;) ).
it is possible that once a MicroBOSH/BOSH is up and running, they are setting themselves as the registry URL correctly.
However, for the micro_bosh.yml bootstrap file, this is the needed magic:
cloud:
plugin: openstack
properties:
openstack:
username: userfoo
api_key: passwordbar
tenant: Tenant
default_security_groups:
- Sec-Group
default_key_name: Pub-key-stack-name
private_key: priv-key-path
registry:
endpoint: http://admin:admin@<IP OF THE MACHINE YOU'RE RUNNING BOSH MICRO DEPLOY ON>:25695
user: admin
password: admin
If you were digging around in the bosh source and found this example:
it tells you to do this:
openstack:
username: foo
api_key: bar
tenant: bosh
region:
endpoint_type: publicURL
default_key_name: default
default_security_groups: ["default"]
registry:
user: admin
password: admin
you'll get the following error, as the code now tries to split a username and password from **user:pass@** before the URL with no error handleing if they are not there.
[hammer@cloudhammer deployments]$ bosh micro deployment tx-col-dev-2/
/usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/registry.rb:14:in `initialize': undefined method `split' for nil:NilClass (NoMethodError)
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager/openstack.rb:14:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager/openstack.rb:14:in `initialize'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:50:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:50:in `initialize'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:43:in `new'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/deployer/instance_manager.rb:43:in `create'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:327:in `deployer'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:55:in `set_current'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli_plugin_micro-1.2341.0/lib/bosh/cli/commands/micro.rb:32:in `micro_deployment'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/command_handler.rb:57:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/runner.rb:56:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/lib/cli/runner.rb:16:in `run'
from /usr/local/rvm/gems/ruby-2.0.0-p451/gems/bosh_cli-1.2341.0/bin/bosh:7:in `<top (required)>'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/bosh:23:in `load'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/bosh:23:in `<main>'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/ruby_executable_hooks:15:in `eval'
from /usr/local/rvm/gems/ruby-2.0.0-p451/bin/ruby_executable_hooks:15:in `<main>'
### Maybe im stupid...
Maybe this is well known by everyone else deploying MicroBOSH / BOSH etc on Openstack..
However, I found no useful information searching around these topics, no examples that got me to the solution above and no true description as to the process of how a bosh-agent tries to get it's data.
It seems to me everyone is just using config_drive, maybe accidentally? Admittedly it's only when config_drive was taken away from me that the issue came to light.
I'm hoping that someone else is scratching their head as to why their VM's arent portable, or how they stop using the config_drive, and this saves them a bit of time.