Upgrade from Prometheus 2.3.2 to 2.17.1 broke external k8s API discovery

17 views
Skip to first unread message

Miguel Bernabeu Diaz

unread,
Apr 15, 2020, 4:46:17 AM4/15/20
to Prometheus Users
Hi!

I have just upgraded from Prometheus 2.3.2 to Prometheus 2.17.1 with no more changes than replacing the binaries and scraping of our Kubernetes clusters started to fail. We host the Prometheus server in a dedicated machine in EC2 and access the K8s API via internal network. This had been working fine for several months until this upgrade.

These errors started appearing in the logs:

level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:407: Failed to list *v1.Service: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"
level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:362: Failed to list *v1.Service: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"
level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:385: Failed to list *v1.Pod: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"
level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:361: Failed to list *v1.Endpoints: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"
level=error ts=2020-04-15T08:35:09.842Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:363: Failed to list *v1.Pod: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"
level=error ts=2020-04-15T08:35:09.846Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:449: Failed to list *v1.Node: Get http://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.16.67.74:80: connect: connection timed out"

It seems to me to it's a similar issue to the one described in https://github.com/prometheus/prometheus/issues/5108 but in the discovery phase. HTTP access to the api server as it's being attempted has never been allowed and was not causing issues in the past. A sample of our configuration is at the end of the message. Any ideas or insights will be very appreciated.

Regards,
Miguel

prometheus.yml:
[...]
  - job_name: 'develop-kubelet'
    metrics_path: '/metrics'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
      cert_file: /etc/prometheus/certs/develop.k8s.local.crt
      key_file: /etc/prometheus/certs/develop.k8s.local.key
    kubernetes_sd_configs:
    - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
      role: node
      tls_config:
        ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
        cert_file: /etc/prometheus/certs/develop.k8s.local.crt
        key_file: /etc/prometheus/certs/develop.k8s.local.key
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/
    - source_labels: [kubernetes_io_hostname]
      target_label: node
 
  - job_name: 'develop-container'
    metrics_path: '/metrics'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
      cert_file: /etc/prometheus/certs/develop.k8s.local.crt
      key_file: /etc/prometheus/certs/develop.k8s.local.key
    kubernetes_sd_configs:
    - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
      role: node
      tls_config:
        ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
        cert_file: /etc/prometheus/certs/develop.k8s.local.crt
        key_file: /etc/prometheus/certs/develop.k8s.local.key
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
    - source_labels: [__meta_kubernetes_node_name]
      target_label: node
    - source_labels: [kubernetes_io_hostname]
      target_label: node

  - job_name: 'develop-endpoint'
    metrics_path: '/metrics'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
      cert_file: /etc/prometheus/certs/develop.k8s.local.crt
      key_file: /etc/prometheus/certs/develop.k8s.local.key
    kubernetes_sd_configs:
    - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
      role: endpoints
      tls_config:
        ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
        cert_file: /etc/prometheus/certs/develop.k8s.local.crt
        key_file: /etc/prometheus/certs/develop.k8s.local.key
    relabel_configs:
    - target_label: __address__
      replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
    - source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_service_name
      - __meta_kubernetes_endpoint_port_name
      separator: ;
      regex: default;kubernetes;https
      replacement: $1
      action: keep
 
  - job_name: 'develop-pod'
    metrics_path: '/metrics'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
      cert_file: /etc/prometheus/certs/develop.k8s.local.crt
      key_file: /etc/prometheus/certs/develop.k8s.local.key
    kubernetes_sd_configs:
    - api_server: internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
      role: pod
      tls_config:
        ca_file: /etc/prometheus/certs/develop.k8s.local.ca.crt
        cert_file: /etc/prometheus/certs/develop.k8s.local.crt
        key_file: /etc/prometheus/certs/develop.k8s.local.key
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - target_label: __address__
      replacement: https://internal-api-develop-k8s-local-meks2l-1292322695.us-east-1.elb.amazonaws.com
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
      regex: ^$
      replacement: http
      target_label: __meta_kubernetes_pod_annotation_prometheus_io_scheme
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      regex: (.+)
      replacement: ${1}
      target_label: __metrics_path__
    - source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_pod_annotation_prometheus_io_scheme
      - __meta_kubernetes_pod_name
      - __meta_kubernetes_pod_annotation_prometheus_io_port
      - __metrics_path__
      regex: (.+);(.+);(.+);(.+);(.+)
      action: replace
      target_label: __metrics_path__
      replacement: /api/v1/namespaces/${1}/pods/${2}:${3}:${4}/proxy${5}
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node
    - source_labels: [__meta_kubernetes_pod_name]
      target_label: service



Reply all
Reply to author
Forward
0 new messages