Hello!
I'm wondering about the functionality of the linkerd Admin's page in how it generates the data. I have a set-up similar to the ingress set-up in the Dogfood example where NGINX sits in front of Linkerd routing traffic to linkerd. Except I use the NGINX ingress controller and create ingress resources that point to Linkerd instance based on path/host. It worked fine until I started testing HTTP requests to a service that wasn't properly configured via Dtabs.
The behavior I was seeing was causing Linkerd's admin page to show faulty numbers. For example, a GET request would send back the Linkerd error of "no such host exists..." due to the Dtab not being there. It's fine, I expected that. I get the same error for PUT/GET/DELETE requests. The admin request count registers them and quickly goes back to 0, which is normal from what I've seen. The issue arises with POST requests. When attempting to make a POST request. I get an NGINX Bad Gateway error as if the upstream server, in this case Linkerd, did not give a response. Furthermore, this causes the Admin to start giving off negative number counts.
After some digging, I found that the admin page polls for metrics.json and it could get it from any of the Linkerd pods. Which under normal circumstances, the numbers would be constant if a failed request hits all three pods. NGINX however is now defaulted to not retry POST requests. This means that a POST request will not be retried, and b/c NGINX ingress controller is in charge of this, only ONE Linkerd pod is hit with the request. This causes all other pod's logs to be out of sync. Which in turn causes metrics.json files to be different.
Is the behavior "normal" in the sense that the Admin page is expected to retrieve logs from any Linkerd pod? If so, that would mean that a default NGINX proxy will cause failing POST requests to throw the metrics.json count out of sync.