Running Drain Script fails for api_z1 on vSphere with: no such file or directory

121 views
Skip to first unread message

Johannes Hiemer

unread,
Mar 24, 2015, 5:25:44 PM3/24/15
to bosh-...@cloudfoundry.org
Hi there,
I am having a serious issue with a CF deployment on vSphere. The complete deployment runs fine until api_z1 gets configure. This fails with:

Started updating job api_z1 > api_z1/0 (canary). Failed: Action Failed get_task: Task 4e6d2c5a-2dc2-4535-5980-57717c2329a4 result: Running Drain Script: Running drain script: Starting command /var/vcap/jobs/cloud_controller_ng/bin/drain job_shutdown hash_unchanged: fork/exec /var/vcap/jobs/cloud_controller_ng/bin/drain: no such file or directory (00:00:01)

Error 450001: Action Failed get_task: Task 4e6d2c5a-2dc2-4535-5980-57717c2329a4 result: Running Drain Script: Running drain script: Starting command /var/vcap/jobs/cloud_controller_ng/bin/drain job_shutdown hash_unchanged: fork/exec /var/vcap/jobs/cloud_controller_ng/bin/drain: no such file or directory

Task 27 error

Any idea, why this occurs? Currently I have no clue at all. I went into the machine and could find the file and the directory. The execution with ruby 2.1.4 of `drain job shutdown` fails.

Would be happy about any input.

Regards,
Johannes

Dmitriy Kalinin

unread,
Mar 24, 2015, 10:53:18 PM3/24/15
to bosh-...@cloudfoundry.org
I would recommend terminating vm from the vSphere and running bosh cck to delete VM reference. We are planning at some point to introduce a way to forcefull skip draining when VM gets into a state when draining is not possible.

Johannes Hiemer

unread,
Mar 25, 2015, 3:39:12 AM3/25/15
to bosh-...@cloudfoundry.org
Dmitriy that worked. I removed the machine, and deployed again, which went very smoothly. Not sure why. Could you shortly explain this behaviour? I have never had that on Openstack.

Thanks a lot!

Dmitriy Kalinin

unread,
Mar 25, 2015, 5:44:11 PM3/25/15
to bosh-...@cloudfoundry.org
This is not OpenStack specific. 

There are several possibilities when the Director might get in this specific state:
- there was a networking incident and Director was not able to complete "apply" operation against a VM
- there was a unrecoverable problem in the Agent when downloading/extracting packages on the VM

Both of these cases might lead to a partially deployed VM (e.g. assets are not fully placed onto VM). Currently the Director does not try hard at finding out if it's appropriate to skip draining/stopping in such cases.
Reply all
Reply to author
Forward
0 new messages