Microbosh - director does not work

77 views

Skip to first unread message

mohamma...@gmail.com

unread,

Aug 10, 2014, 5:29:26 PM8/10/14

to bosh-...@cloudfoundry.org

Hi ,

I have deployed micro-bosh on openstack using the stemcell “bosh-stemcell-2657-openstack-kvm-ubuntu-trusty-go_agent.tgz” , but I am not able to upload the stemcell to micro-bosh. The task stays in the queue or at the best it reaches the state “Extracting stemcell archive” and hangs.

This is the result for “bosh task 7 –debug”

I, [2014-08-10T00:30:31.694531 #6979] INFO -- : Director Version : 1.2657.0

I, [2014-08-10T00:30:31.695667 #6979] INFO -- : Enqueuing task: 7

I, [2014-08-10T00:35:47.315936 #7726] [0x13d1070] INFO -- : Looking for task with task id 7

D, [2014-08-10T00:35:47.413239 #7726] [0x13d1070] DEBUG -- : (0.033323s) SELECT * FROM "tasks" WHERE "id" = 7

I, [2014-08-10T00:35:47.565416 #7726] [0x13d1070] INFO -- : Starting task: 7

I, [2014-08-10T00:35:47.579955 #7726] [task:7] INFO -- : Creating job

D, [2014-08-10T00:37:22.955582 #7726] [task:7] DEBUG -- : (0.019012s) SELECT * FROM "tasks" WHERE "id" = 7

I, [2014-08-10T00:37:22.966266 #7726] [task:7] INFO -- : Performing task: 7

D, [2014-08-10T00:37:22.994471 #7726] [task:7] DEBUG -- : (0.006916s) BEGIN

D, [2014-08-10T00:37:23.079468 #7726] [task:7] DEBUG -- : (0.027711s) UPDATE "tasks" SET "state" = 'processing', "timestamp" = '2014-08-10 00:37:22.969425+0000', "description" = 'create stemcell', "result" = NULL, "output" = '/var/vcap/store/director/tasks/7', "user_id" = NULL, "checkpoint_time" = '2014-08-10 00:37:22.976624+0000', "type" = 'update_stemcell' WHERE ("id" = 7)

D, [2014-08-10T00:37:23.135070 #7726] [task:7] DEBUG -- : (0.047460s) COMMIT

I, [2014-08-10T00:37:23.137392 #7726] [task:7] INFO -- : Processing update stemcell

I, [2014-08-10T00:37:23.171104 #7726] [task:7] INFO -- : Extracting stemcell archive

I am new to this area, so not sure if I am doing something wrong. I tried to investigate. I noticed that director and workers processes are not running for long time. It seems they crash/shutdown and restart. For example the monit status shows all the time that the all processes except director and workers are running (long uptime) , while the status of director and workers are not stable (running with few minutes uptime , not monitored , failed to execute”) .

I am not sure if the director works properly after it restarts, the director.stderr.log shows this error message.

/var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_tcp_server': no acceptor (port is in use or requires root privileges) (RuntimeError)

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_server'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/tcp_server.rb:16:in `connect'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:55:in `block in start'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `call'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in `start'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/server.rb:159:in `start'

from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/bosh-director-1.2657.0/bin/bosh-director:36:in `<top (required)>'

from /var/vcap/packages/director/bin/bosh-director:16:in `load'

from /var/vcap/packages/director/bin/bosh-director:16:in `<main>'

Here is the result of monit status command (complete result is attached):

The Monit daemon 5.2.4 uptime: 5h 37m

……

Process 'director'

status running

monitoring status monitored

pid 13095

parent pid 1

uptime 2m

children 0

memory kilobytes 17592

memory kilobytes total 17592

memory percent 0.4%

memory percent total 0.4%

cpu percent 12.0%

cpu percent total 12.0%

data collected Sun Aug 10 21:10:55 2014

Process 'worker_1'

status running

monitoring status monitored

pid 13169

parent pid 1

uptime 0m

children 0

memory kilobytes 10404

memory kilobytes total 10404

memory percent 0.2%

memory percent total 0.2%

cpu percent 11.9%

cpu percent total 11.9%

data collected Sun Aug 10 21:10:55 2014

Process 'worker_2'

status running

monitoring status monitored

pid 13175

parent pid 1

uptime 0m

children 0

memory kilobytes 10116

memory kilobytes total 10116

memory percent 0.2%

memory percent total 0.2%

cpu percent 12.0%

cpu percent total 12.0%

data collected Sun Aug 10 21:10:55 2014

Process 'worker_3'

status running

monitoring status monitored

pid 13182

parent pid 1

uptime 0m

children 0

memory kilobytes 7492

memory kilobytes total 7492

memory percent 0.1%

memory percent total 0.1%

cpu percent 10.4%

cpu percent total 10.4%

data collected Sun Aug 10 21:10:55 2014

-----

System 'system_bm-5f74f85f-4d1c-4752-9508-1ee98bae5776'

status running

monitoring status monitored

load average [7.11] [6.22] [5.99]

cpu 33.3%us 51.4%sy 0.0%wa

memory usage 378148 kB [9.3%]

swap usage 0 kB [0.0%]

data collected Sun Aug 10 21:10:55 2014

Is it expected that director does not work for long time and restart? What could cause such behavior?

I attached the director configuration , and I also attached log files for director , worker_1 and monit .

Thank you for your help.

-Mohammad

director.stderr.log

director.yml

monit.log

monit.status.txt

worker_1.stderr.log

Reply all

Reply to author

Forward

0 new messages