Gem issues and other errors in bosh-stemcell-1525-openstack-kvm-ubuntu

60 views
Skip to first unread message

m...@matt-j.co.uk

unread,
Dec 20, 2013, 8:42:49 PM12/20/13
to vcap...@cloudfoundry.org
Hi All,

Running into an issue when deploying MicroBOSH to Openstack with stemcell 1525 (Ubuntu).

From what I can see, certain monit services are failing (health monitor and all workers);

root@bm-ed32ef58-5eae-4d6f-874c-e450e3703162:~# monit status
The Monit daemon 5.2.4 uptime: 1h 31m

Process 'nats'
  status                            running
  monitoring status                 monitored
  pid                               2258
  parent pid                        1
  uptime                            1h 31m
  children                          0
  memory kilobytes                  18076
  memory kilobytes total            18076
  memory percent                    0.2%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'redis'
  status                            running
  monitoring status                 monitored
  pid                               2265
  parent pid                        1
  uptime                            1h 31m
  children                          0
  memory kilobytes                  1892
  memory kilobytes total            1892
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.1%
  cpu percent total                 0.1%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'postgres'
  status                            running
  monitoring status                 monitored
  pid                               2324
  parent pid                        1
  uptime                            1h 31m
  children                          14
  memory kilobytes                  4956
  memory kilobytes total            50400
  memory percent                    0.0%
  memory percent total              0.6%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'powerdns'
  status                            running
  monitoring status                 monitored
  pid                               2414
  parent pid                        1
  uptime                            1h 31m
  children                          0
  memory kilobytes                  3564
  memory kilobytes total            3564
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'blobstore_nginx'
  status                            running
  monitoring status                 monitored
  pid                               2430
  parent pid                        1
  uptime                            1h 31m
  children                          2
  memory kilobytes                  2652
  memory kilobytes total            11036
  memory percent                    0.0%
  memory percent total              0.1%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'director'
  status                            running
  monitoring status                 monitored
  pid                               2340
  parent pid                        1
  uptime                            1h 31m
  children                          0
  memory kilobytes                  43160
  memory kilobytes total            43160
  memory percent                    0.5%
  memory percent total              0.5%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:48 2013

Process 'worker_1'
  status                            Does not exist
  monitoring status                 monitored
  data collected                    Sat Dec 21 01:29:49 2013

Process 'worker_2'
  status                            not monitored
  monitoring status                 not monitored
  data collected                    Sat Dec 21 01:29:07 2013

Process 'worker_3'
  status                            Does not exist
  monitoring status                 monitored
  data collected                    Sat Dec 21 01:29:08 2013

Process 'director_scheduler'
  status                            running
  monitoring status                 monitored
  pid                               2438
  parent pid                        1
  uptime                            1h 30m
  children                          0
  memory kilobytes                  43668
  memory kilobytes total            43668
  memory percent                    0.5%
  memory percent total              0.5%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:08 2013

Process 'director_nginx'
  status                            running
  monitoring status                 monitored
  pid                               2451
  parent pid                        1
  uptime                            1h 30m
  children                          2
  memory kilobytes                  2712
  memory kilobytes total            12104
  memory percent                    0.0%
  memory percent total              0.1%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:08 2013

Process 'registry'
  status                            running
  monitoring status                 monitored
  pid                               2459
  parent pid                        1
  uptime                            1h 30m
  children                          0
  memory kilobytes                  42400
  memory kilobytes total            42400
  memory percent                    0.5%
  memory percent total              0.5%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Sat Dec 21 01:29:08 2013

Process 'health_monitor'
  status                            Execution failed
  monitoring status                 monitored
  data collected                    Sat Dec 21 01:29:38 2013

System 'system_bm-ed32ef58-5eae-4d6f-874c-e450e3703162'
  status                            running
  monitoring status                 monitored
  load average                      [0.29] [0.25] [0.24]
  cpu                               1.1%us 0.4%sy 0.0%wa
  memory usage                      390604 kB [4.7%]
  swap usage                        0 kB [0.0%]
  data collected                    Sat Dec 21 01:29:38 2013


Looking through the monit start and stop commands for the health monitor and trying to run the task manually to get more debug gives me a missing gem;

root@bm-ed32ef58-5eae-4d6f-874c-e450e3703162:~# exec chpst -u vcap:vcap  /var/vcap/packages/health_monitor/bin/bosh-monitor
/var/vcap/packages/ruby/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb:247:in `to_specs': Could not find bosh-monitor (>= 0) amongst [bigdecimal-1.1.0, bundler-1.2.3, io-console-0.3, json-1.5.5, minitest-2.5.1, rake-0.9.2.2, rdoc-3.9.5] (Gem::LoadError)
from /var/vcap/packages/ruby/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb:256:in `to_spec'
from /var/vcap/packages/ruby/lib/ruby/site_ruby/1.9.1/rubygems.rb:1231:in `gem'
from /var/vcap/packages/health_monitor/bin/bosh-monitor:22:in `<main>'

However I don't seem to be able to install the gem;

root@bm-ed32ef58-5eae-4d6f-874c-e450e3703162:~# gem install bosh-monitor -v 1.5.0.pre.1525 --pre
ERROR:  Could not find a valid gem 'bosh-monitor' (= 1.5.0.pre.1525) in any repository

The workers give a different error;
/var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:124:in `connect_nonblock': Invalid argument - connect(2) (Errno::EINVAL)
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:124:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:180:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:271:in `establish_connection'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:69:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:290:in `ensure_connected'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:177:in `block in process'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:256:in `logging'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:176:in `process'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:84:in `call'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:737:in `block in get'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:36:in `block in synchronize'
from /var/vcap/packages/ruby/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:36:in `synchronize'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:736:in `get'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:517:in `job'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:450:in `unregister_worker'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:167:in `rescue in work'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:124:in `work'
from /var/vcap/packages/director/gem_home/gems/bosh-director-1.5.0.pre.1525/bin/bosh-director-worker:76:in `<top (required)>'
from /var/vcap/packages/director/bin/bosh-director-worker:23:in `load'
from /var/vcap/packages/director/bin/bosh-director-worker:23:in `<main>'
/var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:124:in `connect_nonblock': Invalid argument - connect(2) (Errno::EINVAL)
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:124:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/connection/ruby.rb:180:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:271:in `establish_connection'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:69:in `connect'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:290:in `ensure_connected'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:177:in `block in process'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:256:in `logging'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:176:in `process'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis/client.rb:84:in `call'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:737:in `block in get'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:36:in `block in synchronize'
from /var/vcap/packages/ruby/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:36:in `synchronize'
from /var/vcap/packages/director/gem_home/gems/redis-3.0.3/lib/redis.rb:736:in `get'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:517:in `job'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:450:in `unregister_worker'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:167:in `rescue in work'
from /var/vcap/packages/director/gem_home/gems/resque-1.23.1/lib/resque/worker.rb:124:in `work'
from /var/vcap/packages/director/gem_home/gems/bosh-director-1.5.0.pre.1525/bin/bosh-director-worker:76:in `<top (required)>'
from /var/vcap/packages/director/bin/bosh-director-worker:23:in `load'
from /var/vcap/packages/director/bin/bosh-director-worker:23:in `<main>'


Which may be related to something strange i've noticed in the yml configuration files for the workers and the director. Host directives which look like they should contain IP addresses only seem to contain '40', as below (director.yml);

root@bm-ed32ef58-5eae-4d6f-874c-e450e3703162:/var/vcap/jobs/director/config# cat director.yml
---
name: xxxxxxxxMicroBOSH
port: 25556
encryption: false
max_tasks: 500
max_threads: 3
logging:
  level: DEBUG
  file: /var/vcap/sys/log/director/director.debug.log

redis:
  host: 40
  port: 25255
  password: redis
  logging:
    level: info
mbus: nats://nats:nats@40:4222
dir: /var/vcap/store/director
db:
  adapter: postgres
  user: postgres
  password: postges
  host: 127.0.0.1
  port: 5432
  database: bosh
  connection_options: {"max_connections":32,"pool_timeout":10}
snapshots:
  enabled: false

scheduled_jobs:

  - command: SnapshotDeployments
    schedule: 0 0 7 * * * UTC


  - command: SnapshotSelf
    schedule: 0 0 6 * * * UTC



scan_and_fix:
  auto_fix_stateful_nodes: true

dns:
  server: 40

  domain_name: microbosh

  db:
    adapter: postgres
    user: postgres
    password: postges
    host: 127.0.0.1
    port: 5432
    database: bosh
    connection_options: {"max_connections":32,"pool_timeout":10}


blobstore:
  provider: dav
  options:

    endpoint: http://40:25250
    user: director
    password: director







cloud:



  plugin: openstack
  properties:
    openstack:
      username: xxxxxxxxxx
      api_key: xxxxxxxxx
      tenant: xxxxxxxxx


      endpoint_type: publicURL




      default_key_name: nimbusmicrobosh
      default_security_groups: ["MicroBOSH-ALL-SC"]
    registry:
      endpoint: http://40:25777
      user: admin
      password: admin




    agent:
      ntp: [ntp.xxxxxxxx]
      blobstore:
        provider: dav
        options:

          endpoint: 'http://10.123.179.189:25250'
          user: agent
          password: agent

      mbus: nats://nats:na...@10.123.179.189:4222


Will be trying another stemcell soon to see if there are any differences, but in the meantime some help on what exactly i'm looking at here or next steps would be really appreciated! 

Many thanks in advance and Happy Christmas (nearly!).

Matt




Jeremy Budnack

unread,
Dec 28, 2013, 9:24:36 PM12/28/13
to vcap...@cloudfoundry.org, m...@matt-j.co.uk
Hello Matt - Hope you had a great Christmas!

When was the last time you updated the BOSH CLI gem?  If its been a while, I'd try a newer version and then give it another go.  In general, I'd either try the latest version of the CLI gem, or at least match the gem with the version of stemcell you are using.  If you have a specific need for v1525, - prior versions of the CLI to try are here: http://rubygems.org/gems/bosh_cli_plugin_micro/versions

As for the IP Addresses:  I don't know - pure speculation on my part suggests that maybe a "split" or "join" went awry when these files were generated?  I'd still try updating the cli gem and see if that goes away (again - pure speculation).

Hope this helps,
Jeremy

m...@matt-j.co.uk

unread,
Dec 29, 2013, 9:42:00 AM12/29/13
to vcap...@cloudfoundry.org, m...@matt-j.co.uk
Hi Jeremy,

Very good Christmas thanks, same to you!

Appreciate any help on this one I can get, (got to the point where i was looking at this and just going... what!)

I was indeed running a slightly older CLI gem than my stemcell;

bosh-registry (1.5.0.pre.1478)
bosh-stemcell (1.5.0.pre.1478)
bosh_aws_cpi (1.5.0.pre.1478)
bosh_cli (1.5.0.pre.1478)
bosh_cli_plugin_micro (1.5.0.pre.1478)
bosh_common (1.5.0.pre.1478)
bosh_cpi (1.5.0.pre.1478)
bosh_openstack_cpi (1.5.0.pre.1478)
bosh_vcloud_cpi (0.4.9)
bosh_vsphere_cpi (1.5.0.pre.1478)


Just tested with the latest gem's via gem update.... And it's worked, all of the issues (including the '40' parse error) have gone away! Director, workers and monit are happy.
Have been using CF/BOSH for a while now so not used to bosh gem updates coming out this thick and fast and matching the stemcell versions! (committed to memory now).. Would have spun on this for much longer i'm sure so many thanks for help!

Enjoy your new year and thanks again.

Matt
Reply all
Reply to author
Forward
0 new messages