Hi ,
I have deployed micro-bosh on openstack using the stemcell “bosh-stemcell-2657-openstack-kvm-ubuntu-trusty-go_agent.tgz” , but I am not able to upload the stemcell to micro-bosh. The task stays in the queue or at the best it reaches the state “Extracting stemcell archive” and hangs.
This is the result for “bosh task 7 –debug”
I, [2014-08-10T00:30:31.694531 #6979] INFO -- : Director Version : 1.2657.0
I, [2014-08-10T00:30:31.695667 #6979] INFO -- : Enqueuing task: 7
I, [2014-08-10T00:35:47.315936 #7726] [0x13d1070] INFO -- : Looking for task with task id 7
D, [2014-08-10T00:35:47.413239 #7726] [0x13d1070] DEBUG -- : (0.033323s) SELECT * FROM "tasks" WHERE "id" = 7
I, [2014-08-10T00:35:47.565416 #7726] [0x13d1070] INFO -- : Starting task: 7
I, [2014-08-10T00:35:47.579955 #7726] [task:7] INFO -- : Creating job
D, [2014-08-10T00:37:22.955582 #7726] [task:7] DEBUG -- : (0.019012s) SELECT * FROM "tasks" WHERE "id" = 7
I, [2014-08-10T00:37:22.966266 #7726] [task:7] INFO -- : Performing task: 7
D, [2014-08-10T00:37:22.994471 #7726] [task:7] DEBUG -- : (0.006916s) BEGIN
D, [2014-08-10T00:37:23.079468 #7726] [task:7] DEBUG -- : (0.027711s) UPDATE "tasks" SET "state" = 'processing', "timestamp" = '2014-08-10 00:37:22.969425+0000', "description" = 'create stemcell', "result" = NULL, "output" = '/var/vcap/store/director/tasks/7', "user_id" = NULL, "checkpoint_time" = '2014-08-10 00:37:22.976624+0000', "type" = 'update_stemcell' WHERE ("id" = 7)
D, [2014-08-10T00:37:23.135070 #7726] [task:7] DEBUG -- : (0.047460s) COMMIT
I, [2014-08-10T00:37:23.137392 #7726] [task:7] INFO -- : Processing update stemcell
I, [2014-08-10T00:37:23.171104 #7726] [task:7] INFO -- : Extracting stemcell archive
I am new to this area, so not sure if I am doing something wrong. I tried to investigate. I noticed that director and workers processes are not running for long time. It seems they crash/shutdown and restart. For example the monit status shows all the time that the all processes except director and workers are running (long uptime) , while the status of director and workers are not stable (running with few minutes uptime , not monitored , failed to execute”) .
I am not sure if the director works properly after it restarts, the director.stderr.log shows this error message.
/var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_tcp_server': no acceptor (port is in use or requires root privileges) (RuntimeError)
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:526:in `start_server'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/tcp_server.rb:16:in `connect'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:55:in `block in start'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `call'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in `start'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/thin-1.5.1/lib/thin/server.rb:159:in `start'
from /var/vcap/packages/director/gem_home/ruby/1.9.1/gems/bosh-director-1.2657.0/bin/bosh-director:36:in `<top (required)>'
from /var/vcap/packages/director/bin/bosh-director:16:in `load'
from /var/vcap/packages/director/bin/bosh-director:16:in `<main>'
Here is the result of monit status command (complete result is attached):
The Monit daemon 5.2.4 uptime: 5h 37m
……
Process 'director'
status running
monitoring status monitored
pid 13095
parent pid 1
uptime 2m
children 0
memory kilobytes 17592
memory kilobytes total 17592
memory percent 0.4%
memory percent total 0.4%
cpu percent 12.0%
cpu percent total 12.0%
data collected Sun Aug 10 21:10:55 2014
Process 'worker_1'
status running
monitoring status monitored
pid 13169
parent pid 1
uptime 0m
children 0
memory kilobytes 10404
memory kilobytes total 10404
memory percent 0.2%
memory percent total 0.2%
cpu percent 11.9%
cpu percent total 11.9%
data collected Sun Aug 10 21:10:55 2014
Process 'worker_2'
status running
monitoring status monitored
pid 13175
parent pid 1
uptime 0m
children 0
memory kilobytes 10116
memory kilobytes total 10116
memory percent 0.2%
memory percent total 0.2%
cpu percent 12.0%
cpu percent total 12.0%
data collected Sun Aug 10 21:10:55 2014
Process 'worker_3'
status running
monitoring status monitored
pid 13182
parent pid 1
uptime 0m
children 0
memory kilobytes 7492
memory kilobytes total 7492
memory percent 0.1%
memory percent total 0.1%
cpu percent 10.4%
cpu percent total 10.4%
data collected Sun Aug 10 21:10:55 2014
-----
System 'system_bm-5f74f85f-4d1c-4752-9508-1ee98bae5776'
status running
monitoring status monitored
load average [7.11] [6.22] [5.99]
cpu 33.3%us 51.4%sy 0.0%wa
memory usage 378148 kB [9.3%]
swap usage 0 kB [0.0%]
data collected Sun Aug 10 21:10:55 2014
Is it expected that director does not work for long time and restart? What could cause such behavior?
I attached the director configuration , and I also attached log files for director , worker_1 and monit .
Thank you for your help.
-Mohammad