RabbitMQ failing health-checks probes

723 views
Skip to first unread message

juan Perez

unread,
Oct 17, 2022, 5:04:03 AM10/17/22
to rabbitmq-users
We are having a recent issue with RabbitMQ failing health-checks probes and therefore  the pod crashing, if we remove those probes the pod starts just fine, is anyone having this issue?

K8s Version v1.25.0
Kubectl v1.25.0
rabbitmq.yml
kubectl create namespace test
kubectl apply rabbitmq.yml

apiVersion: apps/v1
kind: Deployment
metadata:  
  labels:  
  name: rabbitmq
  namespace: test
spec:
  selector:
    matchLabels:
      app: rabbitmq
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rabbitmq
        deployment: IDHP
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rabbitmq
              topologyKey: kubernetes.io/hostname
            weight: 50
      containers:
      - env:
        - name: RABBITMQ_DEFAULT_USER
          value: admin
        - name: RABBITMQ_DEFAULT_PASS
          value: password
        - name: RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS
          value: -rabbit path_prefix "/rabbitmq/"
        image: rabbitmq:3.10-alpine
        imagePullPolicy: Always
        livenessProbe:
          exec:
            command:
            - rabbitmq-diagnostics
            - ping
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: rabbitmq
        readinessProbe:
          exec:
            command:
            - rabbitmq-diagnostics
            - ping
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 800m
            memory: 2560Mi
          requests:
            cpu: 15m
            memory: 500Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities: {}
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: false
        startupProbe:
          exec:
            command:
            - rabbitmq-diagnostics
            - ping
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
      dnsPolicy: ClusterFirst      
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

juan Perez

unread,
Oct 17, 2022, 5:18:20 AM10/17/22
to rabbitmq-users
These errors are new, those probes have been setup for long time, only very recently started to fail, even tried the same yml in other clusters, same outcome

82s         Normal    Scheduled           pod/rabbitmq-555449589d-ltr2w    Successfully assigned test/rabbitmq-555449589d-ltr2w to docker-desktop
82s         Normal    SuccessfulCreate    replicaset/rabbitmq-555449589d   Created pod: rabbitmq-555449589d-ltr2w
82s         Normal    ScalingReplicaSet   deployment/rabbitmq              Scaled up replica set rabbitmq-555449589d to 1
81s         Normal    Pulling             pod/rabbitmq-555449589d-ltr2w    Pulling image "rabbitmq:3.10-alpine"
79s         Normal    Created             pod/rabbitmq-555449589d-ltr2w    Created container rabbitmq
79s         Normal    Started             pod/rabbitmq-555449589d-ltr2w    Started container rabbitmq
79s         Normal    Pulled              pod/rabbitmq-555449589d-ltr2w    Successfully pulled image "rabbitmq:3.10-alpine" in 1.881516892s
69s         Warning   Unhealthy           pod/rabbitmq-555449589d-ltr2w    Startup probe errored: rpc error: code = Unknown desc = deadline exceeded ("DeadlineExceeded"): context deadline exceeded
9s          Warning   Unhealthy           pod/rabbitmq-555449589d-ltr2w    Readiness probe errored: rpc error: code = Unknown desc = deadline exceeded ("DeadlineExceeded"): context deadline exceeded
9s          Warning   Unhealthy           pod/rabbitmq-555449589d-ltr2w    Liveness probe errored: rpc error: code = Unknown desc = deadline exceeded ("DeadlineExceeded"): context deadline exceeded

juan Perez

unread,
Oct 17, 2022, 5:58:54 AM10/17/22
to rabbitmq-users
Looks like increasing the initialDelay and some other tweaks made the pod to run again happily, will be interesting if any of you found something similar recently after upgrading K8s version, we had the previous settings in our ymls running without issues for long time

4m50s       Normal    Scheduled           pod/rabbitmq-84cc6797db-vxhhw    Successfully assigned test/rabbitmq-84cc6797db-vxhhw to docker-desktop
4m50s       Normal    SuccessfulCreate    replicaset/rabbitmq-84cc6797db   Created pod: rabbitmq-84cc6797db-vxhhw
4m50s       Normal    ScalingReplicaSet   deployment/rabbitmq              Scaled up replica set rabbitmq-84cc6797db to 1
4m49s       Normal    Pulling             pod/rabbitmq-84cc6797db-vxhhw    Pulling image "rabbitmq:3.10-alpine"
4m47s       Normal    Pulled              pod/rabbitmq-84cc6797db-vxhhw    Successfully pulled image "rabbitmq:3.10-alpine" in 1.534243971s
4m47s       Normal    Created             pod/rabbitmq-84cc6797db-vxhhw    Created container rabbitmq
4m47s       Normal    Started             pod/rabbitmq-84cc6797db-vxhhw    Started container rabbitmq

          failureThreshold: 30
          initialDelaySeconds: 3
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 10

        name: rabbitmq
        readinessProbe:
          exec:
            command:
            - rabbitmq-diagnostics
            - ping
          failureThreshold: 30
          initialDelaySeconds: 1
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 10

        resources:
          limits:
            cpu: 800m
            memory: 2560Mi
          requests:
            cpu: 200m

            memory: 500Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities: {}
          privileged: false
          readOnlyRootFilesystem: false
          runAsNonRoot: false
        startupProbe:
          exec:
            command:
            - rabbitmq-diagnostics
            - ping
          failureThreshold: 30
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 10

        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
      dnsPolicy: ClusterFirst
      imagePullSecrets:     

      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

On Monday, October 17, 2022 at 10:04:03 AM UTC+1 juan Perez wrote:
Reply all
Reply to author
Forward
0 new messages