Hi Diego,
Thanks for the comprehensive info. Maybe worth mentioning that we are using Rancher 1.1.2 with Cattle container orchestration. It is basically using an overlay network with integrated load balancing to direct traffic within the cluster. I am pretty sure though that I ruled out any network/platform specific issues (touching wood) by running basic connectivity tests (nslookup/ping/telnet etc.).
I noticed I am getting the same errors even when calling the /status admin API endpoint, hence all entries below are Kong/NGINX error log lines from calling this specific URL (in my case,
http://localhost:13888/status", which maps to port 8001 inside the Kong container (0.0.0.0:13888->8001/tcp).
I basically tested/simulated 3 different scenarios:
1. stopping the Postgres DB container ("kong-db" will not get resolved anymore)
[error] 106#0: [lua] responses.lua:101: [postgres error] kong-db could not be resolved (3: Host not found), client: 172.17.0.1, server: kong_admin, request: "GET /status HTTP/1.1", host: "localhost:13888"
2. leaving the Postgres DB container running, but without a running Postgres daemon inside:
[error] 109#0: [lua] responses.lua:101: [postgres error] connection refused, client: 172.17.0.1, server: kong_admin, request: "GET /status HTTP/1.1", host: "localhost:13888"
3. changing the pg_host in kong.conf from a hostname ("kong-db") to the Postgres DB container's IP address:
[error] 117#0: [lua] responses.lua:101: [postgres error] no route to host, client: 172.17.0.1, server: kong_admin, request: "GET /status HTTP/1.1", host: "localhost:13888"
With all 3 scenarios, I get the same error message on the client that I initially described (HTTP 500 "An unexpected error occurred"). So in comparison with your environment, Diego, mine doesn't seem to play as nice :-(.
Thanks,
Stefan