Unresponsive agents

249 views
Skip to first unread message

ferdy

unread,
Apr 15, 2012, 8:59:47 PM4/15/12
to bosh-...@cloudfoundry.org
I'm unable to deploy a release as the agents doesn't respond:

-------------
Getting deployment properties from director...
Compiling deployment manifest...
Please review all changes carefully
Deploying `wordpress-aws.yml' to `yourboshname' (type 'yes' to continue): yes
Tracking task output for task#48...

Preparing deployment
  binding deployment (00:00:00)                                                                     
  binding release (00:00:00)                                                                        
  binding existing deployment: Timed out sending get_state to 83822ec9-2618-437f-aa7d-85eb6266a988 after 30 seconds (00:01:30)
Error                   3/7 00:01:30                                                                

The task has returned an error status, do you want to see debug log? [Yn]: 
-------------

I've tried to restart jobs, and no tasks are running. I've also used the 'bosh cck' command, but it's unable to solve the problem. Any ideas how to restart the agents?

-------------
Performing cloud check...

Scanning 4 VMs
  checking VM states (00:00:10)                                                                     
  0 OK, 4 unresponsive, 0 unbound, 0 out of sync (00:00:00)                                         
Done                    2/2 00:00:10                                                                

Scanning 0 persistent disks
  looking for inactive disks (00:00:00)                                                             
  0 OK, 0 inactive, 0 mount-info mismatch (00:00:00)                                                
Done                    2/2 00:00:00                                                                
Scan is complete, checking if any problems found...

Found 4 problems

Problem 1 of 4: Problem (unresponsive_agent 1) is no longer valid: VM `1' doesn't have a cloud id.
  1. Close problem
Please choose a resolution [1 - 1]: 1

Problem 2 of 4: Problem (unresponsive_agent 4) is no longer valid: VM `4' doesn't have a cloud id.
  1. Close problem
Please choose a resolution [1 - 1]: 1

Problem 3 of 4: Problem (unresponsive_agent 2) is no longer valid: VM `2' doesn't have a cloud id.
  1. Close problem
Please choose a resolution [1 - 1]: 1

Problem 4 of 4: Problem (unresponsive_agent 3) is no longer valid: VM `3' doesn't have a cloud id.
  1. Close problem
Please choose a resolution [1 - 1]: 1

Below is the list of resolutions you've provided
Please make sure everything is fine and confirm your changes

  1. Problem (unresponsive_agent 1) is no longer valid: VM `1' doesn't have a cloud id
     Close problem

  2. Problem (unresponsive_agent 4) is no longer valid: VM `4' doesn't have a cloud id
     Close problem

  3. Problem (unresponsive_agent 2) is no longer valid: VM `2' doesn't have a cloud id
     Close problem

  4. Problem (unresponsive_agent 3) is no longer valid: VM `3' doesn't have a cloud id
     Close problem

Apply resolutions? (type 'yes' to continue): yes
Applying resolutions...

Applying problem resolutions
  unresponsive_agent 1: Close problem (00:00:00)                                                    
  unresponsive_agent 3: Close problem (00:00:00)                                                    
  unresponsive_agent 2: Close problem (00:00:00)                                                    
  unresponsive_agent 4: Close problem (00:00:00)                                                    
Done                    4/4 00:00:00                                                                
Cloudcheck is finished
-------------

- ferdy

Dr Nic Williams

unread,
Apr 15, 2012, 9:23:12 PM4/15/12
to bosh-...@cloudfoundry.org
I'm also seeing this. My wordpress.yml (for AWS) is at https://gist.github.com/62ec357482b92a807a2f

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Oleg Shaldibin

unread,
Apr 15, 2012, 9:26:42 PM4/15/12
to bosh-...@cloudfoundry.org
This means your VMs have never been created in the first place, so there might be a problem with your infrastructure or CPI. VM references are in the BOSH DIrector database but they don't have cloud ids, so BOSH is unable to do anything meaningful. We'll need more context: can you provide us with either your original 'create deployment' debug task log or your director hostname (you can get debug task log with bosh tasks recent <n>, followed by 'bosh task <create-deployment-task-id> --debug').
--
Best,
Oleg

Dr Nic Williams

unread,
Apr 15, 2012, 9:51:02 PM4/15/12
to bosh-...@cloudfoundry.org
I've added debug output to my gist.

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Dr Nic Williams

unread,
Apr 15, 2012, 9:51:32 PM4/15/12
to bosh-...@cloudfoundry.org
Yes, it doesn't look like the VMs got created.

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Oleg Shaldybin

unread,
Apr 15, 2012, 10:07:10 PM4/15/12
to bosh-...@cloudfoundry.org
OK, looks like some compilation VMs got created in the DB but not in the infrastructure. Right now it's easiest just to delete this deployment and try deploying from scratch (you might need to use --force to delete it). We'll add cloudcheck handler for this particular situation soon.
--
Best,
Oleg

Dr Nic Williams

unread,
Apr 15, 2012, 10:11:03 PM4/15/12
to bosh-...@cloudfoundry.org
Ahh, I can imagine that happening as I changed the compilation settings along the way (to make them AWS specific). Deployment deleted and recreated and its underway. Thanks.

Nic

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Dr Nic Williams

unread,
Apr 15, 2012, 10:25:28 PM4/15/12
to bosh-...@cloudfoundry.org
Things are looking good:

Compiling packages
  wordpress/0.1-dev (00:02:52)                                                                      
  mysql/0.1-dev (00:03:10)                                                                          
  mysqlclient/0.1-dev (00:02:50)                                                                    
  nginx/0.1-dev (00:05:08)                                                                          
apache2/0.1-dev                     |oooooooooooooooo        | 4/6 00:14:55  ETA: --:--:--          

Oooh, I forgot to add the "reuse" flag for compilation :)

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Dr Nic Williams

unread,
Apr 15, 2012, 11:12:54 PM4/15/12
to bosh-...@cloudfoundry.org
Next error (getting close!) - "Can't provide any defaults since this is a VIP network"

$ bosh deploy
Getting deployment properties from director...
Compiling deployment manifest...
Please review all changes carefully
Deploying `wordpress-aws.yml' to `yourboshname' (type 'yes' to continue): yes
Tracking task output for task#20...

Preparing deployment
  binding deployment (00:00:00)                                                                     
  binding release (00:00:00)                                                                        
  binding existing deployment (00:00:00)                                                            
  binding resource pools (00:00:00)                                                                 
  binding stemcells (00:00:00)                                                                      
  binding templates (00:00:00)                                                                      
  binding unallocated VMs (00:00:00)                                                                
  binding instance networks (00:00:00)                                                              
Done                    7/7 00:00:00                                                                

Preparing DNS
  binding DNS (00:00:00)                                                                            
Done                    1/1 00:00:00                                                                

Creating bound missing VMs
  common/0: Can't provide any defaults since this is a VIP network (00:00:00)                       
Error                   1/3 00:00:00                                                                


-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Vadim Spivak

unread,
Apr 15, 2012, 11:16:05 PM4/15/12
to bosh-...@cloudfoundry.org
You have to provide a dynamic network as well, since a VIP network is not known to the guest.

-Vadim

Dr Nic Williams

unread,
Apr 15, 2012, 11:22:39 PM4/15/12
to bosh-...@cloudfoundry.org
Currently I am combining:
* code reading - though its still hard to figure out how the manifest files maps into the various Bosh::Director::XXX models; and 
* wild guessing of what schema to go with for an AWS configuration

Do you have any suggestions for changes to the manifest file (in the gist debug output)?

Nic

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Dr Nic Williams

unread,
Apr 15, 2012, 11:23:14 PM4/15/12
to bosh-...@cloudfoundry.org
BTW, all the validation errors that it does give you are very helpful. Thanks for putting those in.

Nic

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Vadim Spivak

unread,
Apr 15, 2012, 11:41:22 PM4/15/12
to bosh-...@cloudfoundry.org
Add the following to the network section for each job.

- name: default
  default: [dns, gateway]

-Vadim

Dr Nic Williams

unread,
Apr 16, 2012, 12:47:27 AM4/16/12
to bosh-...@cloudfoundry.org
Thanks Vadim, I now have 3 lovely VMs booted.

Though, the system doesn't currently work - none currently have elastic IPs, and if I point my browser at each AWS-generated public dns, one of them shows a 504 nginx error (yep, the random nginx VM).

From chatting with Oleg on Friday, I think each VM needs a well-known elastic IP. How do I do this?

My attempts to give each job a specific vip network that had a static IP end me back with the error from before: "Can't provide any defaults since this is a VIP network"

The two manifests are at https://gist.github.com/75ced173309bba0d8b23. An example of a failing job is below:

jobs:
  - name: nginx
    template: nginx
    instances: 1
    resource_pool: common
    networks:
    - name: nginx
      default: [dns, gateway]
      static_ips:
        - 23.23.245.179
    cloud_properties:
      instance_type: m1.small


Thanks again for all help. Soon, we will have a lovely AWS tutorial :)

Nic

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Dr Nic Williams

unread,
Apr 16, 2012, 12:48:09 AM4/16/12
to bosh-...@cloudfoundry.org
I applied my first changes to a deployment manifest and it is very cool to be shown the changes before they are applied.

-- 
Dr Nic Williams
Engine Yard, VP Developer Evangelism

Vadim Spivak

unread,
Apr 16, 2012, 1:40:35 PM4/16/12
to bosh-users
You want to provide both a default network and a VIP network.

- name: nginx
template: nginx
instances: 1
resource_pool: common
networks:
- name: default
default: [dns, gateway]
- name: nginx
static_ips:
- 23.23.245.179

Also, I don't think you need cloud_properties for the job since
they're not applicable there.

-Vadim

On Apr 15, 9:47 pm, Dr Nic Williams <drnicwilli...@gmail.com> wrote:
> Thanks Vadim, I now have 3 lovely VMs booted.
>
> Though, the system doesn't currently work - none currently have elastic IPs, and if I point my browser at each AWS-generated public dns, one of them shows a 504 nginx error (yep, the random nginx VM).
>
> From chatting with Oleg on Friday, I think each VM needs a well-known elastic IP. How do I do this?
>
> My attempts to give each job a specific vip network that had a static IP end me back with the error from before: "Can't provide any defaults since this is a VIP network"
>
> The two manifests are athttps://gist.github.com/75ced173309bba0d8b23. An example of a failing job is below:
>
> jobs:
>   - name: nginx
>     template: nginx
>     instances: 1
>     resource_pool: common
>     networks:
>     - name: nginx
>       default: [dns, gateway]
>       static_ips:
>         - 23.23.245.179
>     cloud_properties:
>       instance_type: m1.small
>
> Thanks again for all help. Soon, we will have a lovely AWS tutorial :)
>
> Nic
>
> --
> Dr Nic Williams
> Engine Yard, VP Developer Evangelismhttp://engineyard.comhttp://drnicwilliams.comhttp://about.me/drnic
> cell +1 (415) 860-2185
>
>
>
>
>
>
>
> On Sunday, April 15, 2012 at 8:41 PM, Vadim Spivak wrote:
> > Add the following to the network section for each job.
>
> > - name: default
> >   default: [dns, gateway]
>
> > -Vadim
>
> > On Apr 15, 2012, at 8:22 PM, Dr Nic Williams <drnicwilli...@gmail.com (mailto:drnicwilli...@gmail.com)> wrote:
>
> > > Currently I am combining:
> > > * code reading - though its still hard to figure out how the manifest files maps into the various Bosh::Director::XXX models; and
> > > * wild guessing of what schema to go with for an AWS configuration
>
> > > Do you have any suggestions for changes to the manifest file (in the gist debug output)?
>
> > > Nic
>
> > > --
> > > Dr Nic Williams
> > > Engine Yard, VP Developer Evangelism
> > >http://engineyard.com
> > >http://drnicwilliams.com
> > >http://about.me/drnic
> > > cell +1 (415) 860-2185
>
> > > On Sunday, April 15, 2012 at 8:16 PM, Vadim Spivak wrote:
>
> > > > You have to provide a dynamic network as well, since a VIP network is not known to the guest.
>
> > > > -Vadim
>
> > > > On Apr 15, 2012, at 8:13 PM, Dr Nic Williams <drnicwilli...@gmail.com (mailto:drnicwilli...@gmail.com)> wrote:
>
> > > > > Next error (getting close!) - "Can't provide any defaults since this is a VIP network"
>
> > > > > $ bosh deploy
> > > > > Getting deployment properties from director...
> > > > > Compiling deployment manifest...
> > > > > Please review all changes carefully
> > > > > Deploying `wordpress-aws.yml' to `yourboshname' (type 'yes' to continue): yes
> > > > > Tracking task output for task#20...
>
> > > > > Preparing deployment
> > > > >   binding deployment (00:00:00)
> > > > >   binding release (00:00:00)
> > > > >   binding existing deployment (00:00:00)
> > > > >   binding resource pools (00:00:00)
> > > > >   binding stemcells (00:00:00)
> > > > >   binding templates (00:00:00)
> > > > >   binding unallocated VMs (00:00:00)
> > > > >   binding instance networks (00:00:00)
> > > > > Done                    7/7 00:00:00
>
> > > > > Preparing DNS
> > > > >   binding DNS (00:00:00)
> > > > > Done                    1/1 00:00:00
>
> > > > > Creating bound missing VMs
> > > > >   common/0: Can't provide any defaults since this is a VIP network (00:00:00)
> > > > > Error                   1/3 00:00:00
>
> > > > > Debug is athttps://gist.github.com/87061a9b550cefbe7d16
> > > > > > > > On Sun, Apr 15, 2012 at 6:51 PM, Dr Nic Williams <drnicwilli...@gmail.com (mailto:drnicwilli...@gmail.com)> wrote:
> > > > > > > > > I've added debug output to my gist.
>
> > > > > > > > > --
> > > > > > > > > Dr Nic Williams
> > > > > > > > > Engine Yard, VP Developer Evangelism
> > > > > > > > >http://engineyard.com
> > > > > > > > >http://drnicwilliams.com
> > > > > > > > >http://about.me/drnic
> > > > > > > > > cell +1 (415) 860-2185 (tel:%2B1%20%28415%29%20860-2185)
>
> > > > > > > > > On Sunday, April 15, 2012 at 6:26 PM, Oleg Shaldibin wrote:
>
> > > > > > > > > > This means your VMs have never been created in the first place, so there might be a problem with your infrastructure or CPI. VM references are in the BOSH DIrector database but they don't have cloud ids, so BOSH is unable to do anything meaningful. We'll need more context: can you provide us with either your original 'create deployment' debug task log or your director hostname (you can get debug task log with bosh tasks recent <n>, followed by 'bosh task <create-deployment-task-id> --debug').
>
> ...
>
> read more »

Dr Nic Williams

unread,
Apr 17, 2012, 12:06:36 AM4/17/12
to bosh-users
Reply all
Reply to author
Forward
0 new messages