Re: Prometheus unable to retrieve JMX Exporter metrics when Istio Sidecar enabled

696 views
Skip to first unread message
Message has been deleted

Leo Li

unread,
Aug 23, 2019, 5:08:47 PM8/23/19
to Prometheus Users
It seems like there is a bug on the current version 0.11 we are using. The bug was repoted fixed in version 0.12. Just FYI. https://github.com/prometheus/jmx_exporter/issues/304


On Thursday, August 22, 2019 at 4:30:56 PM UTC-4, Leo Li wrote:
Hi there,

I kept getting "server returned HTTP status 503 Service Unavailable" on my JMX Exporter scraping job on the pods which has Istio Sidecar enabled. Did anyone encounter similar issues before? 

Here is what I found:
1. It works fine if I disabled the Istio Sidecar;
2. I am able to get the metrics fine through curl within any pods under the same k8s cluster (even under Prometheus pod itself).

I tried creating scaping jobs through Pod/Service/Endpoints, but none of them seem to make any difference. 

Here are my scaping job config:

    - job_name: 'den-jmxexport-pod'
      scheme: http
      scrape_interval: 15s

      kubernetes_sd_configs:
      - role: pod

      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name]
        action: keep
        regex: (audit|web|loader|scheduler|environment|analysis);http-jmx
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod_name
      - source_labels: [__meta_kubernetes_namespace]
        target_label: nservicee
      - source_labels: [__meta_kubernetes_pod_node_name]
        target_label: pod_node
      - source_labels: [__meta_kubernetes_pod_phase]
        target_label: pod_phase
      - source_labels: [__meta_kubernetes_pod_host_ip]
        target_label: pod_host_ip
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)

    - job_name: 'den-jmxexport-service'
      scheme: http
      scrape_interval: 15s

      kubernetes_sd_configs:
      - role: service

      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_service_port_name]
        action: keep
        regex: (audit-service|web-service|loader-service|scheduler-service|environment-service|analysis-service);http-jmx
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_service_cluster_ip]
        target_label: service_ip
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)


As you can see from the following table, all of the pods with Istio Sidecar enabled are not working. The one without Istio Sidecar works fine. 

EndpointStateLabelsLast ScrapeScrape DurationError
http://172.29.11.241:7000/metrics
DOWNapp_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.11.241:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.4.132"pod_name="analysis-deployment-5bf4dd8964-5sstm"pod_node="ip-172-29-4-132.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"13.771s ago330.2msserver returned HTTP status 503 Service Unavailable
http://172.29.12.212:7000/metrics
DOWNapp_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.12.212:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.3.133"pod_name="analysis-deployment-5bf4dd8964-sztzp"pod_node="ip-172-29-3-133.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"20.291s ago849.9msserver returned HTTP status 503 Service Unavailable
http://172.29.15.175:7000/metrics
DOWNapp_kubernetes_io_instance="analysis" app_kubernetes_io_name="analysis"instance="172.29.15.175:7000" job="den-jmxexport-pod"nservicee="analysis" pod_host_ip="172.29.9.69"pod_name="analysis-deployment-5bf4dd8964-rswhw"pod_node="ip-172-29-9-69.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="5bf4dd8964"8.681s ago4.042sserver returned HTTP status 503 Service Unavailable
http://172.29.17.218:7000/metrics
UPapp_kubernetes_io_instance="eksdemo"app_kubernetes_io_name="scheduler" instance="172.29.17.218:7000"job="den-jmxexport-pod" nservicee="eksdemo"pod_host_ip="172.29.31.34"pod_name="scheduler-deployment-7fbb665c5c-8nxtg"pod_node="ip-172-29-31-34.us-east-2.compute.internal"pod_phase="Running" pod_template_hash="7fbb665c5c"7.42s ago13.5ms

Shannon Carey

unread,
Nov 9, 2020, 4:28:17 PM11/9/20
to Prometheus Users
I am seeing this issue with JMX Exporter 0.14.0. I have tried running it as a Java Agent and as a sidecar. I have tried setting the host explicitly to "127.0.0.1" and to "localhost". I have tried using "traffic.sidecar.istio.io/excludeInboundPorts" and "traffic.sidecar.istio.io/excludeOutboundPorts" annotations. Nothing has worked. Most of the time, I am able to see the Prometheus HTTP endpoint if I port-forward directly to a Pod (and the response time is quick, far below 10s). But Prometheus itself gets "EOF", "read: connection reset by peer", or "context deadline exceeded" (and I see the same behavior if accessing it from another Pod in the cluster).

One strange thing I have noticed: if I use the "traffic.sidecar.istio.io" annotations and I do not specify a particular host for JMX Exporter, the first scrape by Prometheus succeeds. However, after that it always gets "context deadline exceeded". I have no idea why this is happening.

I am not able to get it working if I run JMX Exporter as a Java Agent in Kafka Connect inside Docker locally, either. When I try to access the metrics endpoint, I never get a response. There appears to be no way to tell if the JMX Exporter is running or failed or what.

Also, it looks like JMX Exporter uses JUL for logging, whereas Kafka Connect appears to always use Slf4j with slf4j-log4j12. It seems like in order for logging to work, it'd be necessary to put the jul-to-slf4j bridge onto the classpath.

If anyone has any ideas, I'm all ears!

-Shannon

Reply all
Reply to author
Forward
0 new messages