Microbosh - director does not work

77 views
Skip to first unread message

mohamma...@gmail.com

unread,
Aug 10, 2014, 5:29:26 PM8/10/14
to bosh-...@cloudfoundry.org

Hi ,

I have deployed micro-bosh on openstack using the stemcell  “bosh-stemcell-2657-openstack-kvm-ubuntu-trusty-go_agent.tgz” , but I am not able to upload the stemcell to micro-bosh. The task stays in the queue or at the best it reaches the state “Extracting stemcell archive” and hangs.

This is the result for “bosh task 7 –debug”

I, [2014-08-10T00:30:31.694531 #6979]  INFO -- : Director Version : 1.2657.0

I, [2014-08-10T00:30:31.695667 #6979]  INFO -- : Enqueuing task: 7

I, [2014-08-10T00:35:47.315936 #7726] [0x13d1070]  INFO -- : Looking for task with task id 7

D, [2014-08-10T00:35:47.413239 #7726] [0x13d1070] DEBUG -- : (0.033323s) SELECT * FROM "tasks" WHERE "id" = 7

I, [2014-08-10T00:35:47.565416 #7726] [0x13d1070]  INFO -- : Starting task: 7

I, [2014-08-10T00:35:47.579955 #7726] [task:7]  INFO -- : Creating job

D, [2014-08-10T00:37:22.955582 #7726] [task:7] DEBUG -- : (0.019012s) SELECT * FROM "tasks" WHERE "id" = 7

I, [2014-08-10T00:37:22.966266 #7726] [task:7]  INFO -- : Performing task: 7

D, [2014-08-10T00:37:22.994471 #7726] [task:7] DEBUG -- : (0.006916s) BEGIN

D, [2014-08-10T00:37:23.079468 #7726] [task:7] DEBUG -- : (0.027711s) UPDATE "tasks" SET "state" = 'processing', "timestamp" = '2014-08-10 00:37:22.969425+0000', "description" = 'create stemcell', "result" = NULL, "output" = '/var/vcap/store/director/tasks/7', "user_id" = NULL, "checkpoint_time" = '2014-08-10 00:37:22.976624+0000', "type" = 'update_stemcell' WHERE ("id" = 7)

D, [2014-08-10T00:37:23.135070 #7726] [task:7] DEBUG -- : (0.047460s) COMMIT

I, [2014-08-10T00:37:23.137392 #7726] [task:7]  INFO -- : Processing update stemcell

I, [2014-08-10T00:37:23.171104 #7726] [task:7]  INFO -- : Extracting stemcell archive

 

I am new to this area, so not sure if I am doing something wrong. I tried to investigate. I noticed that director and workers processes are not running for long time. It seems they crash/shutdown and restart. For example the monit status shows all the time that the all processes except director and workers are running  (long uptime) , while the status of  director and workers are not stable (running with few minutes uptime , not monitored , failed to execute”) .

I am not sure if the director works properly after it restarts, the director.stderr.log shows this error message.

/var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_tcp_server': no acceptor (port is in use or requires root privileges) (RuntimeError)

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_server'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/tcp_server.rb:16:in `connect'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:55:in `block in start'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `call'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in `start'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/server.rb:159:in `start'

        from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/bosh-director-1.2657.0/bin/bosh-director:36:in `<top (required)>'

        from /var/vcap/packages/director/bin/bosh-director:16:in `load'

        from /var/vcap/packages/director/bin/bosh-director:16:in `<main>'

 

Here is the result of monit status command (complete result is attached):

The Monit daemon 5.2.4 uptime: 5h 37m

…… 

Process 'director'

  status                            running

  monitoring status                 monitored

  pid                               13095

  parent pid                        1

  uptime                            2m

  children                          0

  memory kilobytes                  17592

  memory kilobytes total            17592

  memory percent                    0.4%

  memory percent total              0.4%

  cpu percent                       12.0%

  cpu percent total                 12.0%

  data collected                    Sun Aug 10 21:10:55 2014

 

Process 'worker_1'

  status                            running

  monitoring status                 monitored

  pid                               13169

  parent pid                        1

  uptime                            0m

  children                          0

  memory kilobytes                  10404

  memory kilobytes total            10404

  memory percent                    0.2%

  memory percent total              0.2%

  cpu percent                       11.9%

  cpu percent total                 11.9%

  data collected                    Sun Aug 10 21:10:55 2014

 

Process 'worker_2'

  status                            running

  monitoring status                 monitored

  pid                               13175

  parent pid                        1

  uptime                            0m

  children                          0

  memory kilobytes                  10116

  memory kilobytes total            10116

  memory percent                    0.2%

  memory percent total              0.2%

  cpu percent                       12.0%

  cpu percent total                 12.0%

  data collected                    Sun Aug 10 21:10:55 2014

 

Process 'worker_3'

  status                            running

  monitoring status                 monitored

  pid                               13182

  parent pid                        1

  uptime                            0m

  children                          0

  memory kilobytes                  7492

  memory kilobytes total            7492

  memory percent                    0.1%

  memory percent total              0.1%

  cpu percent                       10.4%

  cpu percent total                 10.4%

  data collected                    Sun Aug 10 21:10:55 2014

-----

System 'system_bm-5f74f85f-4d1c-4752-9508-1ee98bae5776'

  status                            running

  monitoring status                 monitored

  load average                      [7.11] [6.22] [5.99]

  cpu                               33.3%us 51.4%sy 0.0%wa

  memory usage                      378148 kB [9.3%]

  swap usage                        0 kB [0.0%]

  data collected                    Sun Aug 10 21:10:55 2014

 

Is it expected that director does not work for long time and restart? What could cause such behavior?

I attached the director configuration , and I also attached  log files for director , worker_1 and monit .

Thank you for your help.

-Mohammad

 

 

director.stderr.log
director.yml
monit.log
monit.status.txt
worker_1.stderr.log
Reply all
Reply to author
Forward
0 new messages