I've got a pretty strange issue here. Imagine we have two servers...
ServerA and
ServerB. Last night
ServerB pulled down some configuration bits from our puppet servers and tried to re-name itself to
ServerA.
How? Well theres two things that may have triggered this behavior.
1. We use a custom Puppet Node Name fact to set our node names, rather than the hostnames:
[main]
...
# Use the fact 'puppet_node' as our node classifier rather than the hostname.
node_name = facter
node_name_fact = puppet_node
2. We have Nginx proxy_cache all of our GET/HEAD requests to avoid hammering the Puppet Master processes with calls to the mostly static content like templates:
# Never, ever, ever cache our certificate or API requests... always pass them to the puppet master.
location ~ /(.*)/certificate(.*)/(.*)$ { proxy_pass http://unicorn; }
# If a request comes in for the 'master' environment, do not cache it at all
location ~ /master/(.*)$ { proxy_pass http://unicorn; }
location / {
# Cache all requests to the Puppet Unicorn process for at least 10 minutes.
proxy_cache nginx;
proxy_cache_methods GET HEAD;
proxy_cache_key "$scheme$proxy_host$request_uri";
proxy_cache_valid 10m;
proxy_cache_valid 404 1m;
proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie;
proxy_pass http://unicorn;
}
Digging into the logs, it looks like we're caching a bit too much and are actually caching the /<env>/node/<puppet node name> queries. Here you can see that we generate the results once, then return cached results on the next several queries:
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.021
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.000
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.000
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.000
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.000
"GET /production/node/nsp_node_prod? HTTP/1.1" 200 13834 "-" "-" 0.000
So, I have two questions ..
1. What is the purpose of calling the Node API? Is the agent doing this? Why?
2. Is it possible that if an agent called the node api and got "its own node information" that was wrong, it could then request an invalid catalog?
(Note, we're running Puppet 3.4.3 behind Nginx with Unicorn... and yes, even though we use a single node name for these machines, they use different 'facts' to define which packages and roles they are serving up...)
Matt Wise
Sr. Systems Architect
Nextdoor.com