Re: Prometheus unable to retrieve JMX Exporter metrics when Istio Sidecar enabled

Message has been deleted

Leo Li

unread,

Aug 23, 2019, 5:08:47 PM8/23/19

to Prometheus Users

It seems like there is a bug on the current version 0.11 we are using. The bug was repoted fixed in version 0.12. Just FYI. https://github.com/prometheus/jmx_exporter/issues/304

On Thursday, August 22, 2019 at 4:30:56 PM UTC-4, Leo Li wrote:

Hi there,

I kept getting "server returned HTTP status 503 Service Unavailable" on my JMX Exporter scraping job on the pods which has Istio Sidecar enabled. Did anyone encounter similar issues before?

Here is what I found:
1. It works fine if I disabled the Istio Sidecar;
2. I am able to get the metrics fine through curl within any pods under the same k8s cluster (even under Prometheus pod itself).

I tried creating scaping jobs through Pod/Service/Endpoints, but none of them seem to make any difference.

Here are my scaping job config:

- job_name: 'den-jmxexport-pod'
scheme: http
scrape_interval: 15s

kubernetes_sd_configs:
- role: pod

relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name]
action: keep
regex: (audit|web|loader|scheduler|environment|analysis);http-jmx
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod_name
- source_labels: [__meta_kubernetes_namespace]
target_label: nservicee
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: pod_node
- source_labels: [__meta_kubernetes_pod_phase]
target_label: pod_phase
- source_labels: [__meta_kubernetes_pod_host_ip]
target_label: pod_host_ip
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)

- job_name: 'den-jmxexport-service'
scheme: http
scrape_interval: 15s

kubernetes_sd_configs:
- role: service

relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_service_port_name]
action: keep
regex: (audit-service|web-service|loader-service|scheduler-service|environment-service|analysis-service);http-jmx
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_service_cluster_ip]
target_label: service_ip
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)

As you can see from the following table, all of the pods with Istio Sidecar enabled are not working. The one without Istio Sidecar works fine.

Endpoint State Labels Last Scrape Scrape Duration Error
http://172.29.11.241:7000/metrics
DOWN app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.11.241:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.4.132"pod_name="analysis-deployment-5bf4dd8964-5sstm"pod_node="ip-172-29-4-132.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964" 13.771s ago 330.2ms server returned HTTP status 503 Service Unavailable
http://172.29.12.212:7000/metrics
DOWN app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.12.212:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.3.133"pod_name="analysis-deployment-5bf4dd8964-sztzp"pod_node="ip-172-29-3-133.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964" 20.291s ago 849.9ms server returned HTTP status 503 Service Unavailable
http://172.29.15.175:7000/metrics
DOWN app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.15.175:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.9.69"pod_name="analysis-deployment-5bf4dd8964-rswhw"pod_node="ip-172-29-9-69.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964" 8.681s ago 4.042s server returned HTTP status 503 Service Unavailable
http://172.29.17.218:7000/metrics
UP app_kubernetes_io_instance="eksdemo"app_kubernetes_io_name="scheduler" instance="172.29.17.218:7000"job="den-jmxexport-pod" nservicee="eksdemo"pod_host_ip="172.29.31.34"pod_name="scheduler-deployment-7fbb665c5c-8nxtg"pod_node="ip-172-29-31-34.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="7fbb665c5c" 7.42s ago 13.5ms

Endpoint	State	Labels	Last Scrape	Scrape Duration	Error
http://172.29.11.241:7000/metrics	DOWN	app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.11.241:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.4.132"pod_name="analysis-deployment-5bf4dd8964-5sstm"pod_node="ip-172-29-4-132.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"	13.771s ago	330.2ms	server returned HTTP status 503 Service Unavailable
http://172.29.12.212:7000/metrics	DOWN	app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.12.212:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.3.133"pod_name="analysis-deployment-5bf4dd8964-sztzp"pod_node="ip-172-29-3-133.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"	20.291s ago	849.9ms	server returned HTTP status 503 Service Unavailable
http://172.29.15.175:7000/metrics	DOWN	app_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.15.175:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.9.69"pod_name="analysis-deployment-5bf4dd8964-rswhw"pod_node="ip-172-29-9-69.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"	8.681s ago	4.042s	server returned HTTP status 503 Service Unavailable
http://172.29.17.218:7000/metrics	UP	app_kubernetes_io_instance="eksdemo"app_kubernetes_io_name="scheduler" instance="172.29.17.218:7000"job="den-jmxexport-pod" nservicee="eksdemo"pod_host_ip="172.29.31.34"pod_name="scheduler-deployment-7fbb665c5c-8nxtg"pod_node="ip-172-29-31-34.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="7fbb665c5c"	7.42s ago	13.5ms

Shannon Carey

unread,

Nov 9, 2020, 4:28:17 PM11/9/20

to Prometheus Users

I am seeing this issue with JMX Exporter 0.14.0. I have tried running it as a Java Agent and as a sidecar. I have tried setting the host explicitly to "127.0.0.1" and to "localhost". I have tried using "traffic.sidecar.istio.io/excludeInboundPorts" and "traffic.sidecar.istio.io/excludeOutboundPorts" annotations. Nothing has worked. Most of the time, I am able to see the Prometheus HTTP endpoint if I port-forward directly to a Pod (and the response time is quick, far below 10s). But Prometheus itself gets "EOF", "read: connection reset by peer", or "context deadline exceeded" (and I see the same behavior if accessing it from another Pod in the cluster).

One strange thing I have noticed: if I use the "traffic.sidecar.istio.io" annotations and I do not specify a particular host for JMX Exporter, the first scrape by Prometheus succeeds. However, after that it always gets "context deadline exceeded". I have no idea why this is happening.

I am not able to get it working if I run JMX Exporter as a Java Agent in Kafka Connect inside Docker locally, either. When I try to access the metrics endpoint, I never get a response. There appears to be no way to tell if the JMX Exporter is running or failed or what.

Also, it looks like JMX Exporter uses JUL for logging, whereas Kafka Connect appears to always use Slf4j with slf4j-log4j12. It seems like in order for logging to work, it'd be necessary to put the jul-to-slf4j bridge onto the classpath.