linkerd admin functionality

30 views
Skip to first unread message

daniel....@parkhub.com

unread,
Apr 17, 2017, 3:30:49 PM4/17/17
to linkerd-users
Hello!

I'm wondering about the functionality of the linkerd Admin's page in how it generates the data. I have a set-up similar to the ingress set-up in the Dogfood example where NGINX sits in front of Linkerd routing traffic to linkerd. Except I use the NGINX ingress controller and create ingress resources that point to Linkerd instance based on path/host. It worked fine until I started testing HTTP requests to a service that wasn't properly configured via Dtabs.

The behavior I was seeing was causing Linkerd's admin page to show faulty numbers. For example, a GET request would send back the Linkerd error of "no such host exists..." due to the Dtab not being there. It's fine, I expected that. I get the same error for PUT/GET/DELETE requests. The admin request count registers them and quickly goes back to 0, which is normal from what I've seen. The issue arises with POST requests. When attempting to make a POST request. I get an NGINX Bad Gateway error as if the upstream server, in this case Linkerd, did not give a response. Furthermore, this causes the Admin to start giving off negative number counts.

After some digging, I found that the admin page polls for metrics.json and it could get it from any of the Linkerd pods. Which under normal circumstances, the numbers would be constant if a failed request hits all three pods. NGINX however is now defaulted to not retry POST requests. This means that a POST request will not be retried, and b/c NGINX ingress controller is in charge of this, only ONE Linkerd pod is hit with the request. This causes all other pod's logs to be out of sync. Which in turn causes metrics.json files to be different.

Is the behavior "normal" in the sense that the Admin page is expected to retrieve logs from any Linkerd pod? If so, that would mean that a default NGINX proxy will cause failing POST requests to throw the metrics.json count out of sync.

Kevin Lingerfelt

unread,
Apr 17, 2017, 5:17:36 PM4/17/17
to daniel....@parkhub.com, linkerd-users
Hey Daniel,

Interesting -- this is a good find. It looks like when you load the linkerd dashboard via a Kubernetes service config, that request is load balanced by the service to any one of your running linkerd pods. When the dashboard re-fetches metrics, that request is also load balanced, and there's no guarantee that the same pod that initially served the dashboard also serves the subsequent  metrics response. I have no doubt that this could cause display issues with the delta metrics that are calculated on the dashboard.

For your setup, it would be better to tunnel directly to one of your l5d pods, and load the dashboard via the tunnel. That will guarantee that each metrics refresh is served from the same instance. For instance, for the l5d pod "l5d-1nvrp", I run:

kubectl port-forward l5d-1nvrp 9990:9990

And then load the dashboard on localhost:9990. If after creating the tunnel you're still seeing issues with the stats, we should probably open up an issue to investigate.

Hope that helps.

Kevin

--
You received this message because you are subscribed to the Google Groups "linkerd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linkerd-users+unsubscribe@googlegroups.com.
To post to this group, send email to linker...@googlegroups.com.
Visit this group at https://groups.google.com/group/linkerd-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/linkerd-users/777f3730-5f0a-421b-a9af-84cbb5297bbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages