HM9000 fails to start

238 views
Skip to first unread message

brando...@newwave-technologies.com

unread,
Mar 20, 2014, 3:10:18 PM3/20/14
to vcap...@cloudfoundry.org
Hi all,

I'm attempting to launch CF (v160) using the hm9000 package. Everything compiles correctly, but when I go to bosh deploy my deployment, the task fails, saying that the health manager hasn't started after update. Looking through the logs, every hm9000 job except the api server fails with some variation of the following error message (a few also indicate that their respective daemons are down):

{"timestamp":1395341886.539936066,"process_id":25507,"source":"vcap.hm9000.listener","log_level":"info","message":"Acquiring lock for listener","data":null}
{"timestamp":1395341888.944651365,"process_id":25507,"source":"vcap.hm9000.listener","log_level":"error","message":"Failed to talk to lock store - Error:Store request timed out","data":null}

I've tried to restart each job directly using monit but the same thing happens. I've attached my manifest just in case something is wrong there, but I've been following the specs pretty closely. I'm on OpenStack, using the 2200 Ubuntu kvm stemcell. All my CLIs are up to date. Any help would be greatly appreciated, thanks.

Brandon
cf-160.yml

Matthew Kocher

unread,
Mar 21, 2014, 6:11:30 PM3/21/14
to vcap...@cloudfoundry.org
It seems like hm9000 can't talk to etcd. Can you curl etcd from your hm9000 instance?


To unsubscribe from this group and stop receiving emails from it, send an email to vcap-dev+u...@cloudfoundry.org.

brando...@newwave-technologies.com

unread,
Mar 25, 2014, 3:51:08 PM3/25/14
to vcap...@cloudfoundry.org
Sorry, just had time to come back to this. It looks like a test curl command returns successfully (curl http://200.200.200.4:4001/version), and the etcd job never fails during deployment. Any other ideas?

brando...@newwave-technologies.com

unread,
Mar 25, 2014, 4:16:49 PM3/25/14
to vcap...@cloudfoundry.org
OK, it turns out that because I don't know anything about etcd, the fix was to add another node. I noticed this after looking at the etcd logs and noticing it was trying to contact itself to set up a cluster. I added a second node and the deploy and subsequent login went fine. Blerg.

d.v.fro...@gmail.com

unread,
Jun 2, 2014, 5:01:57 AM6/2/14
to vcap...@cloudfoundry.org
Hi Adams, I have the same problem. Please, tell me about your solution: what exactly did you done ?

среда, 26 марта 2014 г., 3:16:49 UTC+7 пользователь Brandon Adams написал:
Reply all
Reply to author
Forward
0 new messages