bosh cf compilation failure due to unreachable NATS

74 views
Skip to first unread message

nima.k...@gmail.com

unread,
Dec 4, 2013, 2:06:49 AM12/4/13
to bosh-...@cloudfoundry.org
Hi all,

I am doing a CF BOSH installation on openstack.

The issue is that vm instances internal to openstack cannot resolve floating IPs for one another. In my deployment, settings.json for the compilation workers references NATS on microbosh via its floating IP. As a result, workers cannot play ping/pong with NATS and compilation fails.

I do not have much control over the network configuration of the openstack deployment and most likely I need to resolve it from the BOSH side. I believe the solution would be to use floating IPs for external access and local IPs for internal communications. So, I have the following questions:
  • Is there a network setup that I can define in the bosh deployment yml to get the compilation workers communicate back with the microbosh NATS using the local IPs? If yes, can someone share a sample yml file?
  • If the answer to the above is no, then what is the workaround? can I modify the openstack registry to feed the local IP for the NATS server to the compilation workers? 
Any help is much appreciated.
-Nima

Ferran Rodenas

unread,
Dec 4, 2013, 2:30:15 AM12/4/13
to bosh-...@cloudfoundry.org
A workaround can be creating a vm on the same network, install ruby+bosh_cli on the vm, and then use that vm to install microbosh. If you're in the same network, you don't need to use floating ip's (don't set the 'vip' parm at the micro_bosh.yml). Later you can manually assign a floating ip to the microbosh vm.

Another workaround is to ssh into the microbosh vm, and update manually the /var/vcap/jobs/director/config/director.yml.erb file changing the floating IP's to the private IP's (and don't forget to restart director process: monit restart director).

- Ferdy


2013/12/3 <nima.k...@gmail.com>
To unsubscribe from this group and stop receiving emails from it, send an email to bosh-users+...@cloudfoundry.org.

nima.k...@gmail.com

unread,
Dec 6, 2013, 4:20:09 PM12/6/13
to bosh-...@cloudfoundry.org
Thanks a lot Ferdy for the tip. Compilation went through after I restarted the director.

It all goes fine until it gets to updating the core job.

At this point, I face the following issues:
  • uaa-cf-registrar fails to start with connection issues like the following
    • the scheme nats does not accept registry part: nats:xx...@0.core.default.nbcf.microbosh:4222 (or bad hostname?) (URI::InvalidURIError)
  • Other components disappear with the following error
2013-12-06_21:09:04.57083 Service: health_manager_next
2013-12-06_21:09:04.57083           Event: Does not exist
2013-12-06_21:09:04.57084           Action: restart
2013-12-06_21:09:04.57084           Date: Fri, 06 Dec 2013 21:09:04 +0000
2013-12-06_21:09:04.57084           Description: process is not running

I can telnet to 0.core.default.nbcf.microbosh on port 4222 and get the NATS authentication verification message. so the NATS server is running on the target host. Not sure what causes the problem. I also have the deployment file attached. It is a minimalistic deployment with only 3 machines involved. 

Any idea what is wrong?

thanks a lot,
Nima
cf-deploy.yml

Ferran Rodenas

unread,
Dec 6, 2013, 5:33:30 PM12/6/13
to bosh-...@cloudfoundry.org
Try to NOT use the '@' symbol on passwords ;) The authority part of the URI is not well parsed, as the '@' symbol indicates the termination of the authority.

- Ferdy

Nima Kaviani

unread,
Dec 6, 2013, 6:38:20 PM12/6/13
to bosh-...@cloudfoundry.org
oh man! I have been looking into all sorts of IP resolution problems. Anyway, that got the problem solved and I have my deployment running. 

thanks so much Ferdy.
-Nima

James Bayer

unread,
Dec 7, 2013, 7:02:48 PM12/7/13
to bosh-...@cloudfoundry.org
Thank you,

James Bayer
Reply all
Reply to author
Forward
0 new messages