Struggling with resurrection

132 views
Skip to first unread message

Michael

unread,
Jan 5, 2015, 7:29:03 PM1/5/15
to bosh-...@cloudfoundry.org
I'm using BOSH 1.2798.0 and having trouble getting resurrection to work. You can see all my BOSH gem versions and the full manifest (including stemcell) in this gist: https://gist.github.com/mkb/bf284f9dee7d6ad515d3

I've got a block in my manifest like this:

properties:
hm:
resurrector_enabled: true
resurrector:
minimum_down_jobs: 1
percent_threshold: 1

After deploying I issue a `bosh vm resurrection sinatra N enable` for each instance. When I manually terminate an instance from the AWS console I expect BOSH to notice the disappearance and reprovision but nothing happens. I can't find a lot of docs or examples online so I resorted to fiddling with the parameters but with no luck so far.

Any suggestions for how I might troubleshoot this further?

Thanks,

--mkb


Dmitriy Kalinin

unread,
Jan 6, 2015, 1:38:11 PM1/6/15
to bosh-...@cloudfoundry.org
What does your logs in /var/vcap/sys/log/health_monitor/ (on the vm health monitor is deployed to) show?

Michael Brodhead

unread,
Jan 6, 2015, 2:18:27 PM1/6/15
to bosh-...@cloudfoundry.org
Aha! That's very helpful. I've posted the last 200 lines at: https://gist.github.com/mkb/345a4241f5f768c109c4

Here are the last several lines. I highlighted two which stood out to me:

E, [2015-01-06T19:06:26.078235 #2880] ERROR : Cannot get deployments from director at https://127.0.0.1:25555/deployments: 401 Not authorized
 
E, [2015-01-06T19:06:26.078419 #2880] ERROR : /var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.2801.0/lib/bosh/monitor/director.rb:17:in `get_deployments'
/var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.2801.0/lib/bosh/monitor/runner.rb:137:in `fetch_deployments'
/var/vcap/packages/health_monitor/gem_home/ruby/2.1.0/gems/bosh-monitor-1.2801.0/lib/bosh/monitor/runner.rb:97:in `block in poll_director'
I, [2015-01-06T19:06:51.768865 #2880] INFO : Analyzing agents...
W, [2015-01-06T19:06:51.769193 #2880] WARN : Agent 17232fd6-74ab-4347-8f35-f5ed756d2e73 is not a part of any deployment
I, [2015-01-06T19:06:51.769654 #2880] INFO : [ALERT] Alert @ 2015-01-06 19:06:51 UTC, severity 2: 17232fd6-74ab-4347-8f35-f5ed756d2e73 is not a part of any deployment
I, [2015-01-06T19:06:51.769804 #2880] INFO : Analyzed 1 agent, took 0.000669947 seconds
I, [2015-01-06T19:07:10.187439 #2880] INFO : Managing 0 deployments, 1 agent
I, [2015-01-06T19:07:10.187764 #2880] INFO : Agent heartbeats received = 1721

Does this mean I need to provide the health monitor with creds to access my director? The agent UUID in the second line I highlighted corresponds to the one VM I have running at the moment.

Thanks,

--mkb

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.

Dmitriy Kalinin

unread,
Jan 6, 2015, 2:20:56 PM1/6/15
to bosh-...@cloudfoundry.org
Yeap. Unfortunately we currently do not auto create hm user on the Director. You should use `bosh create user` with hm creds found in your bosh deployment manifest.

Dr Nic Williams

unread,
Jan 6, 2015, 2:23:37 PM1/6/15
to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org
Perhaps the first time "bosh create user" is run there is a warning "You've just broken the health manager/resurrector. Please setup a HM user."



On Tue, Jan 6, 2015 at 11:18 AM, Michael Brodhead <m...@womply.com> wrote:

Dmitriy Kalinin

unread,
Jan 6, 2015, 2:29:08 PM1/6/15
to bosh-...@cloudfoundry.org
Im thinking of moving away from current users model and only look up users in the manifest as a simple solution. In addition to that I want to also integrate UAA with bosh for auth. (micro will have its own copy of UAA collocated with it.)

Dr Nic Williams

unread,
Jan 6, 2015, 2:49:51 PM1/6/15
to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org
Sounds interesting!

Michael Brodhead

unread,
Jan 6, 2015, 3:37:30 PM1/6/15
to bosh-...@cloudfoundry.org
OK, I created the user but it's not clear which config the credentials should go into. Adding them to my own app's manifest or the settings.yml for my bosh-bootstrap had no effect.

Secondary question: can you point me to some docs on passing properties from the command line? After reading everything I could find I still wasn't able to get that to work. While I am still evaluating bosh I don't mind putting cleartext creds into config files but for real production use I can't do that.

Thanks,

--mkb

Dr Nic Williams

unread,
Jan 6, 2015, 5:54:11 PM1/6/15
to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org
mkb, you'll now give up bosh boostrap (doesn't have an option yet for custom HM user but that's a good idea) and edit the micro_bosh.yml directly

Michael Brodhead

unread,
Jan 6, 2015, 6:04:02 PM1/6/15
to bosh-...@cloudfoundry.org
FWIW, when I recreated the admin:admin user the 401s went away though the manager still was not recreating my instances. Presumably this is because I'm still not configuring it properly.

How does one deploy an updated micro_bosh.yml if not by invoking bosh-bootstrap?

Sorry if many of my questions seem utterly obvious. I honestly am making every attempt to RTFM but haven't found a lot of detail. I imagine that's a symptom of early days on a fast moving project.

--mkb

Dr Nic Williams

unread,
Jan 6, 2015, 6:09:20 PM1/6/15
to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org
Good question on re-deploying microbosh post-bootstrap.

Change into the newly created deployments folder. Then target the firstbosh folder:

cd deployments
bosh micro deployment firstbosh
bosh micro deploy firstbosh/<stemcell path>

Dr Nic Williams

unread,
Jan 6, 2015, 6:09:39 PM1/6/15
to bosh-...@cloudfoundry.org, bosh-...@cloudfoundry.org
Not sure why HM didn't kick into gear when admin:admin is returned.

Michael Brodhead

unread,
Jan 6, 2015, 6:17:54 PM1/6/15
to bosh-...@cloudfoundry.org
I'm guessing that's because my test deployment is tiny, so a single missing sever exceeded the default 20% meltdown threshold.

--mkb


Dmitriy Kalinin

unread,
Jan 7, 2015, 12:46:30 PM1/7/15
to bosh-...@cloudfoundry.org, m...@womply.com
That sounds right. Does tweaking those properties fix this for you?
To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+unsubscribe@cloudfoundry.org.

Michael Brodhead

unread,
Jan 7, 2015, 2:33:34 PM1/7/15
to Dmitriy Kalinin, bosh-...@cloudfoundry.org
It's no clear yet. Starting up my micro bosh initially gave an error about a missing hm property which tells me I actually have them in the right place. I was able to fix that easily.

The bad news is after recreating my bosh installation I am no longer able to set my deployment. Running from my local machine, most commands are able to interact with the director but when I issue a `bosh deployment` I see a series of "cannot access director warnings until execution eventually expires. So far this is 100% reproducible. Other commands work, but `bosh deployment` craps out.

You can see an example of that here: https://gist.github.com/mkb/bf89c02ac78eb9725f57

I would love suggestions on how to troubleshoot the problem or on what I might have done wrong.

Thanks,

--mbk

Dmitriy Kalinin

unread,
Jan 7, 2015, 2:49:26 PM1/7/15
to m...@womply.com, bosh-...@cloudfoundry.org
Try running `bosh deployments` to see what director has deployed. Make sure that your deployment manifest has a correct director_uuid (matches with bosh status) and matching name for the deployment.

Michael Brodhead

unread,
Jan 7, 2015, 9:00:00 PM1/7/15
to Dmitriy Kalinin, bosh-...@cloudfoundry.org
Yep, that was it. I hadn't updated the director_uuid after recreating my bosh. I've been distracted by other work but tomorrow I can get back to making the health monitor work.

Thanks!

--mkb
Reply all
Reply to author
Forward
0 new messages