Node fails to start with timeout_waiting_for_tables in a Kubernetes deployment

jianxin ren

unread,

Jun 2, 2020, 9:28:02 PM6/2/20

to rabbitmq-users

My basic environment:

[{pid,473},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.7.14"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.14"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.14"},
      {cowboy,"Small, fast, modern HTTP server.","2.6.1"},
      {cowlib,"Support library for manipulating Web protocols.","2.7.0"},
      {amqp_client,"RabbitMQ AMQP Client","3.7.14"},
      {rabbitmq_peer_discovery_k8s,
          "Kubernetes-based RabbitMQ peer discovery backend","3.7.14"},
      {rabbitmq_peer_discovery_common,
          "Modules shared by various peer discovery backends","3.7.14"},
      {rabbit,"RabbitMQ","3.7.14"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.7.14"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.7.1"},
      {ssl,"Erlang/OTP SSL application","9.2.3.1"},
      {public_key,"Public key infrastructure","1.6.6"},
      {crypto,"CRYPTO","4.4.2"},
      {os_mon,"CPO  CXC 138 46","2.4.7"},
      {sysmon_handler,"Rate-limiting system_monitor event handler","1.1.0"},
      {asn1,"The Erlang ASN1 compiler version 5.0.8","5.0.8"},
      {recon,"Diagnostic tools for production use","2.4.0"},
      {mnesia,"MNESIA  CXC 138 12","4.15.6"},
      {inets,"INETS  CXC 138 49","7.0.7"},
      {xmerl,"XML parser","1.3.20"},
      {jsx,"a streaming, evented json parsing toolkit","2.9.0"},
      {lager,"Erlang logging framework","3.6.9"},
      {goldrush,"Erlang event stream processor","0.1.9"},
      {compiler,"ERTS  CXC 138 10","7.3.2"},
      {syntax_tools,"Syntax tools","2.1.7"},
      {sasl,"SASL  CXC 138 11","3.3"},
      {stdlib,"ERTS  CXC 138 10","3.8.2"},
      {kernel,"ERTS  CXC 138 10","6.3.1"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 21 [erts-10.3.5.1] [source] [64-bit] [smp:80:56] [ds:80:56:10] [async-threads:896] [hipe]\n"},
 {memory,
     [{connection_readers,0},
      {connection_writers,0},
      {connection_channels,0},
      {connection_other,2820},
      {queue_procs,0},
      {queue_slave_procs,0},
      {plugins,1288348},
      {other_proc,27380476},
      {metrics,211204},
      {mgmt_db,255944},
      {mnesia,89184},
      {other_ets,2876160},
      {binary,156224},
      {msg_index,32368},
      {code,27750686},
      {atom,1352953},
      {other_system,62404169},
      {allocated_unused,99234856},
      {reserved_unallocated,0},
      {strategy,rss},
      {total,[{erlang,123800536},{rss,148848640},{allocated,223035392}]}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{http,15672,"::"}]},
 {vm_memory_calculation_strategy,rss},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,108026116505},
 {disk_free_limit,50000000},
 {disk_free,998596608},
 {file_descriptors,
     [{total_limit,1048476},
      {total_used,2},
      {sockets_limit,943626},
      {sockets_used,0}]},
 {processes,[{limit,1048576},{used,466}]},
 {run_queue,1},
 {uptime,51066},
 {kernel,{net_ticktime,90}}]

Baisc description about my problem

particularly, can be refered to https://github.com/rabbitmq/discussions/issues/108

here, again thank Michael for his first guide, but still i have doubt about "Error while waiting for Mnesia tables:.." , hope some discussion with you.

Best Regards

Wesley Peng

unread,

Jun 2, 2020, 9:41:00 PM6/2/20

to rabbitm...@googlegroups.com

jianxin ren wrote:
> here, again thank Michael for his first guide, but still i
> have doubt about "Error while waiting for Mnesia tables:.." , hope some
> discussion with you.

So, what's the specific issue you have? rabbitmq can't run on K8S?

Thanks.

Smith David

unread,

Jun 2, 2020, 10:23:00 PM6/2/20

to rabbitmq-users

in three-nodes cluster, manually force delete a rabbitmq pod/node where deleted node can restart but logs error"Error while waiting for Mnesia tables:.." until retries exceed, crash and Application mnisa exit: stopped

在 2020年6月3日星期三 UTC+8上午9:41:00，Wesley Peng写道：

Michael Klishin

unread,

Jun 2, 2020, 11:02:02 PM6/2/20

to rabbitm...@googlegroups.com

This has been explained on this list and in RabbitMQ community Slack several times in the last week.

[1] explain what the message means. In the context of Kubernetes that can mean that you use a readiness probe

that requires the node to be fully booted which will not be the case for restarted nodes unless their data

was also deleted (in that case they will start as blank nodes). [2] explains what your options are for health checks/readiness probes.

There can be other scenarios but you are not giving us a lot of context to work with.

We would be able to suggest more if there was a detailed sequence of events and full logs from all nodes.

A single sentence problem description and a one line error message is nowhere near enough information for an informed answer.

If you are looking for free help on this list, please help others help you.

[3][4] can be used as examples of certain fundamental things about deploying RabbitMQ to Kubernetes.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/86390845-9312-41c6-8eb6-5ba141295d73%40googlegroups.com.

Michael Klishin

unread,

Jun 2, 2020, 11:10:22 PM6/2/20

to rabbitm...@googlegroups.com

[1] is another example that can be used as a reference. It reminds me of another possible scenario:

`cluster_formation.node_cleanup.only_log_warning` can be set to true by some operators without

understanding the consequences, and as a result the node that goes down to be re-created from its persistent volume

is removed from the cluster in the meantime [2][3], and cannot rejoin.

Kubernetes deployment file and logs from all nodes can prove this hypothesis right or wrong. Consider sharing them.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CC75C97C-B630-4E35-9808-B40D5C327210%40vmware.com.

Michael Klishin

unread,

Jun 2, 2020, 11:12:38 PM6/2/20

to rabbitm...@googlegroups.com

Oops, “`cluster_formation.node_cleanup.only_log_warning` can be set to true by some operators without

understanding the consequences…” should read

“`cluster_formation.node_cleanup.only_log_warning` can be set to FALSE [enabling forced node removal] by some operators without

understanding the consequences…”

The zupzup.org post does the safe thing, just made me remember that these are scenarios we have seen in general

and specifically on Kubernetes.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/098D1186-3807-4D3E-8700-E7772EA4D823%40vmware.com.

jianxin ren

unread,

Jun 3, 2020, 2:49:03 AM6/3/20

to rabbitmq-users

thanks for your share, Michael

Yes, i have read those docs and seems no more useful clues for my question.

My local rabbitmq.conf:

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-cm-config-for-%APPNAME%
data:
  advanced.config: |
      [
        {kernel, [{net_ticktime, 60}]}
      ].
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].
  rabbitmq.conf: |
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      cluster_formation.k8s.address_type = hostname
      cluster_formation.node_cleanup.interval = 30
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      queue_master_locator=min-masters
      loopback_users.guest = false
      cluster_formation.k8s.service_name = rabbitmq
      cluster_formation.k8s.hostname_suffix = .rabbitmq.%NAMESPACE%.svc.cluster.local
      #initialize broker data
      #management.load_definitions = /rabbitmq-def/definitions.json
      log.console.level = debug
      #mnesia_table_loading_retry_timeout = 30000
      #mnesia_table_loading_retry_limit = 10

by configmap & statefulset manifest files above, a three-nodes cluster is up and healthy. At some time, i run 'kubectl delete po/{rabbitmq-2} --grace-periods=0 --force'. After that, a new rabbitmq-2 pod re-running very fast, but 'rabbitmqctl cluster_status' in one pod of two remaining pods, like rabbitmq-0, shows that as follows

[{nodes,[{disc,['rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',
                'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',
                'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},
 {running_nodes,['rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',
                 'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},
 {cluster_name,<<"rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},
 {partitions,[]},
 {alarms,[{'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},
          {'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

running nodes and alarms miss rabbitmq-2 node info, and re-running rabbitmq-2 logs resembles

2020-06-03 06:38:35.172 [info] <0.260.0> Running boot step worker_pool defined by app rabbit
2020-06-03 06:38:35.172 [debug] <0.260.0> Applying MFA: M = rabbit_sup, F = start_supervisor_child, A = [worker_pool_sup]
2020-06-03 06:38:35.174 [info] <0.260.0> Running boot step database defined by app rabbit
2020-06-03 06:38:35.174 [debug] <0.260.0> Applying MFA: M = rabbit_mnesia, F = init, A = []
2020-06-03 06:38:49.194 [info] <0.260.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-06-03 06:39:19.195 [warning] <0.260.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-03 06:39:19.195 [info] <0.260.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-06-03 06:39:49.196 [warning] <0.260.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-03 06:39:49.196 [info] <0.260.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-06-03 06:40:19.197 [warning] <0.260.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-03 06:40:19.197 [info] <0.260.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
...

while one remaining pod logs snippets

...
2020-06-03 06:39:06.797 [info] <0.433.0> rabbit on node 'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local' down
2020-06-03 06:39:06.797 [error] <0.925.0> ** Node 'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local' not responding **
** Removing (timedout) connection **
2020-06-03 06:39:06.823 [info] <0.433.0> Node rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local is down, deleting its listeners
2020-06-03 06:39:06.826 [info] <0.433.0> node 'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local' down: net_tick_timeout
2020-06-03 06:39:07.890 [info] <0.433.0> node 'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local' up
2020-06-03 06:39:25.200 [debug] <0.514.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-03 06:39:25.203 [debug] <0.514.0> Peer discovery: cleanup discovered unreachable nodes: ['rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']
2020-06-03 06:39:25.204 [debug] <0.514.0> GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/local-fast/endpoints/rabbitmq
...

在 2020年6月3日星期三 UTC+8上午11:10:22，Michael Klishin写道：

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/86390845-9312-41c6-8eb6-5ba141295d73%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

jianxin ren

unread,

Jun 3, 2020, 3:04:28 AM6/3/20

to rabbitmq-users

sorry, statefulset manifest yaml looks like

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rabbitmq
spec:
  serviceName: rabbitmq
  replicas: 3
  selector:
    matchLabels:
      app: rabbitmq
  template:
    metadata:
      labels:
        app: rabbitmq
    spec:
      imagePullSecrets:
      - name: regcred
      serviceAccountName: rabbitmq-sa-for-ha-dev
      terminationGracePeriodSeconds: 10
      containers:        
      - name: rabbitmq
        image: rabbitmq:3.8.3-management
        volumeMounts:
        - name: config-volume
          mountPath: /etc/rabbitmq
        - name: config-def-volume
          mountPath: /rabbitmq-def
        - name: rabbitmq-data-pvc-for-ha-dev
          mountPath: /var/lib/rabbitmq
        - name: log-path
          mountPath: /var/log/rabbitmq
        ports:
        - name: http-15672
          protocol: TCP
          containerPort: 15672
        - name: amqp-5672
          protocol: TCP
          containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmq-diagnostics", "status"]
          initialDelaySeconds: 10
          # See https://www.rabbitmq.com/monitoring.html for monitoring frequency recommendations.
          periodSeconds: 30
          timeoutSeconds: 15
        readinessProbe:
          exec:
            command: ["rabbitmq-diagnostics", "status"]
          initialDelaySeconds: 10
          periodSeconds: 30
          timeoutSeconds: 15
        imagePullPolicy: Always
        env:
            #- name: MY_POD_IP
            #valueFrom:
            #  fieldRef:
            #    fieldPath: status.podIP
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          # See a note on cluster_formation.k8s.address_type in the config file section
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(MY_POD_NAME).rabbitmq.local-fast.svc.cluster.local"
          - name: K8S_SERVICE_NAME
            value: "rabbitmq"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie" 
      volumes:
        - name: log-path
          hostPath:
            path: /tmp
        - name: config-def-volume
          configMap:
            name: rabbitmq-def-cm-config-for-ha-dev
            items:
            - key: definitions.json
              path: definitions.json
        - name: config-volume
          configMap:
            name: rabbitmq-cm-config-for-ha-dev
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins
            - key: advanced.config
              path: advanced.config
  volumeClaimTemplates:
    - metadata: 
        name: rabbitmq-data-pvc-for-ha-dev
        #kind: PersistentVolumeClaim
        #apiVersion: v1
      spec:
        storageClassName: ceph-rbd
        accessModes: 
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

在 2020年6月3日星期三 UTC+8下午2:49:03，jianxin ren写道：

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local', 'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local', 'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]}, {running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local', 'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']}, {cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>}, {partitions,[]}, {alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]}, {'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

Michael Klishin

unread,

Jun 3, 2020, 10:04:25 AM6/3/20

to rabbitm...@googlegroups.com

I’m afraid I do not understand what

running nodes and alarms miss rabbitmq-2 node info

means. That there are no alarms on the node? That’s indeed the case but how would alarms be relevant here?

Do you check for alarms in the readiness probe? Speaking of which, you have not shared the Kubernetes deployment file

In full, so we don’t know what the readiness probe is. Please do share all information you are asked for (and edit any sensitive values

as needed).

If you restart a node, it will expect all of its peers to be started within a window of time, retrying to connect to them [1].

If it cannot connect to them, then you need to find out what prevents such rejoining node from contacting its running peers.

Again, full server logs would likely provide all or most cues you’d need.

The messages about unreachable nodes certainly seem to be worth investigating.

What I am again not sure about is why do you say “one remaining pod”. Are you losing 2 out of 3 nodes in the process?

That’s not going to work very well with several features in modern RabbitMQ releases [2]. In the future such scenarios would

be considered unrecoverable.

The autoheal partition handling strategy can further complicate the recovery and reconnection decisions of the nodes.

We are still very much *guessing* what you do and what each of the nodes observe. If you are looking for an informed advice

from the community, *consider sharing as much information as possible* as guessing is a very time consuming way of debugging distributed systems.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6b5e4c23-5ada-4e9a-abd4-d5ceacfb552c%40googlegroups.com.

jianxin ren

unread,

Jun 3, 2020, 10:10:38 PM6/3/20

to rabbitmq-users

（1）"running nodes and alarms miss rabbitmq-2 node info" looks like

at this time, in rabbitmq-2 pod log, "Waiting for Mnesia tables for 30000 ms, 4 retries left. \n Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}" retries exceed, rabbitmq-2 pod restart，whose state changes looks like

at this time, rabbitmq-2 re-join cluster and cluster_status ok.

在 2020年6月3日星期三 UTC+8下午10:04:25，Michael Klishin写道：

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6b5e4c23-5ada-4e9a-abd4-d5ceacfb552c%40googlegroups.com.

jianxin ren

unread,

Jun 3, 2020, 10:15:59 PM6/3/20

to rabbitmq-users

please refer to attachment file for deployment manifest.

在 2020年6月3日星期三 UTC+8下午10:04:25，Michael Klishin写道：

I’m afraid I do not understand what

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/6b5e4c23-5ada-4e9a-abd4-d5ceacfb552c%40googlegroups.com.

test-case.tar.gz

Michael Klishin

unread,

Jun 4, 2020, 4:13:45 PM6/4/20

to rabbitm...@googlegroups.com

I’m afraid I do not understand the response or what is highlighted in the second screenshot.

Again, alarms, unless you check for them in the readiness probe, are likely not very relevant.

Consider posting *full logs* and not snippets of logs as screenshots if you are looking for an informed opinion.

I’m afraid I’m out of ideas with the amount of information we have.

[{nodes,[{disc,['rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/446f942b-8767-4685-87a6-c22be48670b7%40googlegroups.com.

Michael Klishin

unread,

Jun 4, 2020, 4:17:02 PM6/4/20

to rabbitm...@googlegroups.com

I don’t see anything that would stand out in these deployment files, e.g. volumeClaimTemplates look reasonable.

We need a set of steps taken during the test and full logs from all nodes.

[{nodes,[{disc,['rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/7fc524e9-8043-410b-bfa2-a087c4d3659b%40googlegroups.com.

jianxin ren

unread,

Jun 5, 2020, 2:33:39 AM6/5/20

to rabbitmq-users

second screenshot means crashed rabbitmq pod's state transfer from deleted to re-running, state map like Terminating，Pending，Containercreating and Running. Although 1/1 running, the pod was stuck in "Error while waiting for Mnesia tables..." until crash exit. Then, state is Completed that means rabbitmq-server in pod restart. That is not my expectation.

在 2020年6月5日星期五 UTC+8上午4:13:45，Michael Klishin写道：

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/446f942b-8767-4685-87a6-c22be48670b7%40googlegroups.com.

Smith David

unread,

Jun 12, 2020, 11:56:48 PM6/12/20

to rabbitmq-users

anyone else faced with same problem?

在 2020年6月3日星期三 UTC+8上午9:28:02，jianxin ren写道：

jianxin ren

unread,

Jun 14, 2020, 12:50:21 AM6/14/20

to rabbitmq-users

2020-06-14 04:40:54.560 [debug] <0.117.0> Lager installed handler error_logger_lager_h into error_logger
2020-06-14 04:40:54.560 [debug] <0.129.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2020-06-14 04:40:54.560 [debug] <0.120.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2020-06-14 04:40:54.560 [debug] <0.123.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2020-06-14 04:40:54.560 [debug] <0.126.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2020-06-14 04:40:54.560 [debug] <0.135.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2020-06-14 04:40:54.560 [debug] <0.138.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2020-06-14 04:40:54.560 [debug] <0.132.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2020-06-14 04:40:54.560 [debug] <0.141.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2020-06-14 04:40:54.560 [debug] <0.144.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2020-06-14 04:40:55.048 [debug] <0.111.0> Lager installed handler lager_backend_throttle into lager_event
2020-06-14 04:41:19.854 [info] <0.257.0> 
 Starting RabbitMQ 3.7.14 on Erlang 21.3.8.1
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.14. Copyright (C) 2007-2019 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2020-06-14 04:41:19.855 [info] <0.257.0> 
 node           : rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/advanced.config
                : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local
2020-06-14 04:41:20.369 [info] <0.257.0> Running boot step pre_boot defined by app rabbit
2020-06-14 04:41:20.369 [info] <0.257.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-06-14 04:41:20.369 [debug] <0.257.0> Applying MFA: M = rabbit_sup, F = start_child, A = [rabbit_metrics]
2020-06-14 04:41:20.372 [info] <0.257.0> Running boot step rabbit_alarm defined by app rabbit
2020-06-14 04:41:20.372 [debug] <0.257.0> Applying MFA: M = rabbit_alarm, F = start, A = []
2020-06-14 04:41:20.372 [debug] <0.263.0> Supervisor rabbit_alarm_sup started rabbit_alarm:start_link() at pid <0.264.0>
2020-06-14 04:41:20.376 [info] <0.266.0> Memory high watermark set to 103141 MiB (108151429529 bytes) of 257853 MiB (270378573824 bytes) total
2020-06-14 04:41:20.376 [debug] <0.265.0> Supervisor vm_memory_monitor_sup started vm_memory_monitor:start_link(0.4, #Fun<rabbit_alarm.0.13465474>, #Fun<rabbit_alarm.1.13465474>) at pid <0.266.0>
2020-06-14 04:41:20.433 [info] <0.268.0> Enabling free disk space monitoring
2020-06-14 04:41:20.433 [info] <0.268.0> Disk free limit set to 50MB
2020-06-14 04:41:20.437 [debug] <0.267.0> Supervisor rabbit_disk_monitor_sup started rabbit_disk_monitor:start_link(50000000) at pid <0.268.0>
2020-06-14 04:41:20.437 [info] <0.257.0> Running boot step code_server_cache defined by app rabbit
2020-06-14 04:41:20.437 [debug] <0.257.0> Applying MFA: M = rabbit_sup, F = start_child, A = [code_server_cache]
2020-06-14 04:41:20.437 [info] <0.257.0> Running boot step file_handle_cache defined by app rabbit
2020-06-14 04:41:20.437 [debug] <0.257.0> Applying MFA: M = rabbit, F = start_fhc, A = []
2020-06-14 04:41:20.438 [info] <0.271.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-06-14 04:41:20.438 [debug] <0.270.0> Supervisor file_handle_cache_sup started file_handle_cache:start_link(fun rabbit_alarm:set_alarm/1, fun rabbit_alarm:clear_alarm/1) at pid <0.271.0>
2020-06-14 04:41:20.438 [info] <0.272.0> FHC read buffering:  OFF
2020-06-14 04:41:20.438 [info] <0.272.0> FHC write buffering: ON
2020-06-14 04:41:20.438 [info] <0.257.0> Running boot step worker_pool defined by app rabbit
2020-06-14 04:41:20.438 [debug] <0.257.0> Applying MFA: M = rabbit_sup, F = start_supervisor_child, A = [worker_pool_sup]
2020-06-14 04:41:20.441 [info] <0.257.0> Running boot step database defined by app rabbit
2020-06-14 04:41:20.441 [debug] <0.257.0> Applying MFA: M = rabbit_mnesia, F = init, A = []
2020-06-14 04:41:34.465 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-06-14 04:42:04.466 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:42:04.466 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2020-06-14 04:42:34.467 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:42:34.467 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 7 retries left
2020-06-14 04:43:04.468 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:43:04.468 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
2020-06-14 04:43:34.469 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:43:34.469 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 5 retries left
2020-06-14 04:44:04.470 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:44:04.470 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 4 retries left
2020-06-14 04:44:34.533 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:44:34.533 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 3 retries left
2020-06-14 04:45:04.534 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:45:04.534 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 2 retries left
2020-06-14 04:45:34.535 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:45:34.535 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 1 retries left
2020-06-14 04:46:04.536 [warning] <0.257.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]}
2020-06-14 04:46:04.536 [info] <0.257.0> Waiting for Mnesia tables for 30000 ms, 0 retries left
2020-06-14 04:46:34.537 [error] <0.256.0> CRASH REPORT Process <0.256.0> with 0 neighbours exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}} in application_master:init/4 line 138
2020-06-14 04:46:34.538 [info] <0.43.0> Application rabbit exited with reason: {{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_route,rabbit_durable_exchange,rabbit_runtime_parameters,rabbit_durable_queue]},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{timeout_waiting_for_tables,[rabbit_user,rabbit_user_permission,rabbit_topic_permission,rabbit_vhost,rabbit_durable_r

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

that is full log from crashed pod rabbitmq-2

在 2020年6月5日星期五 UTC+8上午4:13:45，Michael Klishin写道：

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/446f942b-8767-4685-87a6-c22be48670b7%40googlegroups.com.

jianxin ren

unread,

Jun 14, 2020, 1:01:48 AM6/14/20

to rabbitmq-users

2020-06-14 04:46:53.235 [debug] <0.129.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2020-06-14 04:46:53.235 [debug] <0.117.0> Lager installed handler error_logger_lager_h into error_logger
2020-06-14 04:46:53.235 [debug] <0.123.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2020-06-14 04:46:53.235 [debug] <0.135.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2020-06-14 04:46:53.235 [debug] <0.120.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2020-06-14 04:46:53.235 [debug] <0.144.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2020-06-14 04:46:53.235 [debug] <0.126.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2020-06-14 04:46:53.235 [debug] <0.132.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2020-06-14 04:46:53.235 [debug] <0.138.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2020-06-14 04:46:53.235 [debug] <0.141.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2020-06-14 04:46:53.659 [debug] <0.111.0> Lager installed handler lager_backend_throttle into lager_event
2020-06-14 04:46:56.657 [info] <0.43.0> Application mnesia exited with reason: stopped
2020-06-14 04:46:57.638 [info] <0.268.0>

 Starting RabbitMQ 3.7.14 on Erlang 21.3.8.1
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.14. Copyright (C) 2007-2019 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...

2020-06-14 04:46:57.638 [info] <0.268.0>

 node           : rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/advanced.config
                : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local

2020-06-14 04:46:58.266 [info] <0.268.0> Running boot step pre_boot defined by app rabbit
2020-06-14 04:46:58.266 [info] <0.268.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-06-14 04:46:58.266 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_child, A = [rabbit_metrics]
2020-06-14 04:46:58.269 [info] <0.268.0> Running boot step rabbit_alarm defined by app rabbit
2020-06-14 04:46:58.269 [debug] <0.268.0> Applying MFA: M = rabbit_alarm, F = start, A = []
2020-06-14 04:46:58.270 [debug] <0.348.0> Supervisor rabbit_alarm_sup started rabbit_alarm:start_link() at pid <0.349.0>
2020-06-14 04:46:58.277 [info] <0.351.0> Memory high watermark set to 103141 MiB (108151429529 bytes) of 257853 MiB (270378573824 bytes) total
2020-06-14 04:46:58.277 [debug] <0.350.0> Supervisor vm_memory_monitor_sup started vm_memory_monitor:start_link(0.4, #Fun<rabbit_alarm.0.13465474>, #Fun<rabbit_alarm.1.13465474>) at pid <0.351.0>
2020-06-14 04:46:58.337 [info] <0.353.0> Enabling free disk space monitoring
2020-06-14 04:46:58.337 [info] <0.353.0> Disk free limit set to 50MB
2020-06-14 04:46:58.341 [debug] <0.352.0> Supervisor rabbit_disk_monitor_sup started rabbit_disk_monitor:start_link(50000000) at pid <0.353.0>
2020-06-14 04:46:58.342 [info] <0.268.0> Running boot step code_server_cache defined by app rabbit
2020-06-14 04:46:58.342 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_child, A = [code_server_cache]
2020-06-14 04:46:58.342 [info] <0.268.0> Running boot step file_handle_cache defined by app rabbit
2020-06-14 04:46:58.342 [debug] <0.268.0> Applying MFA: M = rabbit, F = start_fhc, A = []
2020-06-14 04:46:58.342 [info] <0.356.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-06-14 04:46:58.342 [debug] <0.355.0> Supervisor file_handle_cache_sup started file_handle_cache:start_link(fun rabbit_alarm:set_alarm/1, fun rabbit_alarm:clear_alarm/1) at pid <0.356.0>
2020-06-14 04:46:58.342 [info] <0.357.0> FHC read buffering:  OFF
2020-06-14 04:46:58.342 [info] <0.357.0> FHC write buffering: ON
2020-06-14 04:46:58.343 [info] <0.268.0> Running boot step worker_pool defined by app rabbit
2020-06-14 04:46:58.343 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_supervisor_child, A = [worker_pool_sup]
2020-06-14 04:46:58.345 [info] <0.268.0> Running boot step database defined by app rabbit
2020-06-14 04:46:58.345 [debug] <0.268.0> Applying MFA: M = rabbit_mnesia, F = init, A = []
2020-06-14 04:46:58.350 [info] <0.268.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-06-14 04:46:58.361 [info] <0.268.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-06-14 04:46:58.437 [info] <0.268.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-06-14 04:46:58.438 [debug] <0.268.0> Peer discovery backend supports initialisation.
2020-06-14 04:46:58.438 [debug] <0.268.0> Peer discovery Kubernetes: initialising...
2020-06-14 04:46:58.438 [debug] <0.268.0> HTTP client proxy is not configured
2020-06-14 04:46:58.438 [debug] <0.268.0> Peer discovery backend initialisation succeeded.
2020-06-14 04:46:58.438 [info] <0.268.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping registration.
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step database_sync defined by app rabbit
2020-06-14 04:46:58.438 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_child, A = [mnesia_sync]
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step codec_correctness_check defined by app rabbit
2020-06-14 04:46:58.438 [debug] <0.268.0> Applying MFA: M = rabbit_binary_generator, F = check_empty_frame_size, A = []
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step external_infrastructure defined by app rabbit
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step rabbit_registry defined by app rabbit
2020-06-14 04:46:58.438 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_child, A = [rabbit_registry]
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2020-06-14 04:46:58.438 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [auth_mechanism,<<"RABBIT-CR-DEMO">>,rabbit_auth_mechanism_cr_demo]
2020-06-14 04:46:58.438 [info] <0.268.0> Running boot step rabbit_queue_location_random defined by app rabbit
2020-06-14 04:46:58.438 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [queue_master_locator,<<"random">>,rabbit_queue_location_random]
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_event defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_event]
2020-06-14 04:46:58.439 [debug] <0.425.0> Supervisor rabbit_event_sup started rabbit_event:start_link() at pid <0.426.0>
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [auth_mechanism,<<"AMQPLAIN">>,rabbit_auth_mechanism_amqplain]
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [auth_mechanism,<<"PLAIN">>,rabbit_auth_mechanism_plain]
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [exchange,<<"direct">>,rabbit_exchange_type_direct]
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [exchange,<<"fanout">>,rabbit_exchange_type_fanout]
2020-06-14 04:46:58.439 [info] <0.268.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2020-06-14 04:46:58.439 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [exchange,<<"headers">>,rabbit_exchange_type_headers]
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [exchange,<<"topic">>,rabbit_exchange_type_topic]
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [ha_mode,<<"all">>,rabbit_mirror_queue_mode_all]
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [ha_mode,<<"exactly">>,rabbit_mirror_queue_mode_exactly]
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [ha_mode,<<"nodes">>,rabbit_mirror_queue_mode_nodes]
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_priority_queue defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_priority_queue, F = enable, A = []
2020-06-14 04:46:58.440 [info] <0.268.0> Priority queues enabled, real BQ is rabbit_variable_queue
2020-06-14 04:46:58.440 [info] <0.268.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2020-06-14 04:46:58.440 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [queue_master_locator,<<"client-local">>,rabbit_queue_location_client_local]
2020-06-14 04:46:58.441 [info] <0.268.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2020-06-14 04:46:58.441 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [queue_master_locator,<<"min-masters">>,rabbit_queue_location_min_masters]
2020-06-14 04:46:58.441 [info] <0.268.0> Running boot step kernel_ready defined by app rabbit
2020-06-14 04:46:58.441 [info] <0.268.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2020-06-14 04:46:58.441 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_sysmon_minder]
2020-06-14 04:46:58.441 [debug] <0.427.0> Supervisor rabbit_sysmon_minder_sup started rabbit_sysmon_minder:start_link() at pid <0.428.0>
2020-06-14 04:46:58.441 [info] <0.268.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2020-06-14 04:46:58.441 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_epmd_monitor]
2020-06-14 04:46:58.442 [debug] <0.429.0> Supervisor rabbit_epmd_monitor_sup started rabbit_epmd_monitor:start_link() at pid <0.430.0>
2020-06-14 04:46:58.442 [info] <0.268.0> Running boot step guid_generator defined by app rabbit
2020-06-14 04:46:58.442 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_guid]
2020-06-14 04:46:58.454 [info] <0.268.0> Running boot step rabbit_node_monitor defined by app rabbit
2020-06-14 04:46:58.454 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_node_monitor]
2020-06-14 04:46:58.454 [debug] <0.431.0> Supervisor rabbit_guid_sup started rabbit_guid:start_link() at pid <0.432.0>
2020-06-14 04:46:58.516 [info] <0.434.0> Starting rabbit_node_monitor
2020-06-14 04:46:58.516 [info] <0.268.0> Running boot step delegate_sup defined by app rabbit
2020-06-14 04:46:58.516 [debug] <0.268.0> Applying MFA: M = rabbit, F = boot_delegate, A = []
2020-06-14 04:46:58.516 [debug] <0.433.0> Supervisor rabbit_node_monitor_sup started rabbit_node_monitor:start_link() at pid <0.434.0>
2020-06-14 04:46:58.517 [info] <0.268.0> Running boot step rabbit_memory_monitor defined by app rabbit
2020-06-14 04:46:58.517 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_memory_monitor]
2020-06-14 04:46:58.517 [info] <0.268.0> Running boot step core_initialized defined by app rabbit
2020-06-14 04:46:58.517 [info] <0.268.0> Running boot step upgrade_queues defined by app rabbit
2020-06-14 04:46:58.517 [debug] <0.268.0> Applying MFA: M = rabbit_upgrade, F = maybe_migrate_queues_to_per_vhost_storage, A = []
2020-06-14 04:46:58.517 [debug] <0.452.0> Supervisor rabbit_memory_monitor_sup started rabbit_memory_monitor:start_link() at pid <0.453.0>
2020-06-14 04:46:58.548 [info] <0.268.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = gen_event, F = add_handler, A = [rabbit_event,rabbit_connection_tracking_handler,[]]
2020-06-14 04:46:58.548 [info] <0.268.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_exchange_parameters, F = register, A = []
2020-06-14 04:46:58.548 [info] <0.268.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-mode">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-params">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-sync-mode">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-sync-batch-size">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-promote-on-shutdown">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.548 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"ha-promote-on-failure">>,rabbit_mirror_queue_misc]
2020-06-14 04:46:58.549 [info] <0.268.0> Running boot step rabbit_policies defined by app rabbit
2020-06-14 04:46:58.549 [debug] <0.268.0> Applying MFA: M = rabbit_policies, F = register, A = []
2020-06-14 04:46:58.549 [info] <0.268.0> Running boot step rabbit_policy defined by app rabbit
2020-06-14 04:46:58.549 [debug] <0.268.0> Applying MFA: M = rabbit_policy, F = register, A = []
2020-06-14 04:46:58.550 [info] <0.268.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2020-06-14 04:46:58.550 [debug] <0.268.0> Applying MFA: M = rabbit_registry, F = register, A = [policy_validator,<<"queue-master-locator">>,rabbit_queue_location_validator]
2020-06-14 04:46:58.550 [info] <0.268.0> Running boot step rabbit_vhost_limit defined by app rabbit
2020-06-14 04:46:58.550 [debug] <0.268.0> Applying MFA: M = rabbit_vhost_limit, F = register, A = []
2020-06-14 04:46:58.550 [info] <0.268.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2020-06-14 04:46:58.550 [debug] <0.268.0> Applying MFA: M = gen_event, F = add_handler, A = [rabbit_event,rabbit_mgmt_reset_handler,[]]
2020-06-14 04:46:58.550 [info] <0.268.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2020-06-14 04:46:58.550 [debug] <0.268.0> Applying MFA: M = rabbit_mgmt_db_handler, F = add_handler, A = []
2020-06-14 04:46:58.550 [info] <0.268.0> Management plugin: using rates mode 'basic'
2020-06-14 04:46:58.635 [info] <0.268.0> Running boot step recovery defined by app rabbit
2020-06-14 04:46:58.635 [debug] <0.268.0> Applying MFA: M = rabbit, F = recover, A = []
2020-06-14 04:46:58.637 [debug] <0.465.0> Recovering data for VHost <<"/test/vh1">>
2020-06-14 04:46:58.637 [info] <0.465.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local/msg_stores/vhosts/2UT7I7VE3PMS2Q4I02Z9CE4VM' for vhost '/test/vh1' exists
2020-06-14 04:46:58.639 [debug] <0.464.0> Supervisor {<0.464.0>,rabbit_vhost_sup} started rabbit_recovery_terms:start_link(<<"/test/vh1">>) at pid <0.466.0>
2020-06-14 04:46:58.652 [info] <0.465.0> Starting message stores for vhost '/test/vh1'
2020-06-14 04:46:58.652 [info] <0.469.0> Message store "2UT7I7VE3PMS2Q4I02Z9CE4VM/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2020-06-14 04:46:58.653 [info] <0.465.0> Started message store of type transient for vhost '/test/vh1'
2020-06-14 04:46:58.653 [debug] <0.464.0> Supervisor {<0.464.0>,rabbit_vhost_sup} started rabbit_msg_store:start_link(msg_store_transient, "/var/lib/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local/msg_stores/vh...", undefined, {#Fun<rabbit_variable_queue.2.103200040>,ok}) at pid <0.469.0>
2020-06-14 04:46:58.653 [info] <0.472.0> Message store "2UT7I7VE3PMS2Q4I02Z9CE4VM/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2020-06-14 04:46:58.653 [warning] <0.472.0> Message store "2UT7I7VE3PMS2Q4I02Z9CE4VM/msg_store_persistent": rebuilding indices from scratch
2020-06-14 04:46:58.654 [info] <0.465.0> Started message store of type persistent for vhost '/test/vh1'
2020-06-14 04:46:58.654 [debug] <0.464.0> Supervisor {<0.464.0>,rabbit_vhost_sup} started rabbit_msg_store:start_link(msg_store_persistent, "/var/lib/rabbitmq/mnesia/rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local/msg_stores/vh...", [], {#Fun<rabbit_queue_index.2.32138423>,{start,[]}}) at pid <0.472.0>
2020-06-14 04:46:58.654 [debug] <0.464.0> Supervisor {<0.464.0>,rabbit_vhost_sup} started rabbit_amqqueue_sup_sup:start_link() at pid <0.478.0>
2020-06-14 04:46:58.656 [info] <0.268.0> Running boot step load_definitions defined by app rabbitmq_management
2020-06-14 04:46:58.656 [debug] <0.268.0> Applying MFA: M = rabbit_mgmt_load_definitions, F = maybe_load_definitions, A = []
2020-06-14 04:46:58.656 [info] <0.268.0> Applying definitions from /rabbitmq-def/definitions.json
2020-06-14 04:46:58.656 [info] <0.268.0> Asked to import definitions. Acting user: rmq-internal
2020-06-14 04:46:58.656 [info] <0.268.0> Importing users...
2020-06-14 04:46:58.675 [info] <0.268.0> Setting user tags for user 'test_qa' to [administrator]
2020-06-14 04:46:58.693 [info] <0.268.0> Importing vhosts...
2020-06-14 04:46:58.693 [info] <0.268.0> Importing user permissions...
2020-06-14 04:46:58.693 [info] <0.268.0> Setting permissions for 'test_qa' in '/test/vh1' to '.*', '.*', '.*'
2020-06-14 04:46:58.711 [info] <0.268.0> Importing topic permissions...
2020-06-14 04:46:58.711 [info] <0.268.0> Importing parameters...
2020-06-14 04:46:58.711 [info] <0.268.0> Importing global parameters...
2020-06-14 04:46:58.711 [info] <0.268.0> Importing policies...
2020-06-14 04:46:58.750 [info] <0.268.0> Importing queues...
2020-06-14 04:46:58.750 [info] <0.268.0> Importing exchanges...
2020-06-14 04:46:58.750 [info] <0.268.0> Importing bindings...
2020-06-14 04:46:58.750 [info] <0.268.0> Running boot step empty_db_check defined by app rabbit
2020-06-14 04:46:58.750 [debug] <0.268.0> Applying MFA: M = rabbit, F = maybe_insert_default_data, A = []
2020-06-14 04:46:58.750 [info] <0.268.0> Running boot step rabbit_looking_glass defined by app rabbit
2020-06-14 04:46:58.750 [debug] <0.268.0> Applying MFA: M = rabbit_looking_glass, F = boot, A = []
2020-06-14 04:46:58.750 [info] <0.268.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2020-06-14 04:46:58.750 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [rabbit_core_metrics_gc]
2020-06-14 04:46:58.750 [info] <0.268.0> Running boot step background_gc defined by app rabbit
2020-06-14 04:46:58.750 [debug] <0.268.0> Applying MFA: M = rabbit_sup, F = start_restartable_child, A = [background_gc]
2020-06-14 04:46:58.750 [debug] <0.494.0> Supervisor rabbit_core_metrics_gc_sup started rabbit_core_metrics_gc:start_link() at pid <0.495.0>
2020-06-14 04:46:58.751 [info] <0.268.0> Running boot step connection_tracking defined by app rabbit
2020-06-14 04:46:58.751 [debug] <0.496.0> Supervisor background_gc_sup started background_gc:start_link() at pid <0.497.0>
2020-06-14 04:46:58.751 [debug] <0.268.0> Applying MFA: M = rabbit_connection_tracking, F = boot, A = []
2020-06-14 04:46:58.753 [info] <0.268.0> Setting up a table for connection tracking on this node: 'tracked_connecti...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local'
2020-06-14 04:46:58.755 [info] <0.268.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_pe...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local'
2020-06-14 04:46:58.756 [info] <0.268.0> Running boot step routing_ready defined by app rabbit
2020-06-14 04:46:58.756 [info] <0.268.0> Running boot step pre_flight defined by app rabbit
2020-06-14 04:46:58.756 [info] <0.268.0> Running boot step notify_cluster defined by app rabbit
2020-06-14 04:46:58.756 [debug] <0.268.0> Applying MFA: M = rabbit_node_monitor, F = notify_node_up, A = []
2020-06-14 04:46:58.756 [info] <0.268.0> Running boot step networking defined by app rabbit
2020-06-14 04:46:58.756 [debug] <0.268.0> Applying MFA: M = rabbit_networking, F = boot, A = []
2020-06-14 04:46:58.757 [info] <0.434.0> rabbit on node 'rab...@rabbitmq-1.rabbitmq.rabbitmq-ns.svc.cluster.local' up
2020-06-14 04:46:58.758 [warning] <0.500.0> Setting Ranch options together with socket options is deprecated. Please use the new map syntax that allows specifying socket options separately from other options.
2020-06-14 04:46:58.758 [info] <0.514.0> started TCP listener on [::]:5672
2020-06-14 04:46:58.759 [info] <0.268.0> Running boot step direct_client defined by app rabbit
2020-06-14 04:46:58.759 [debug] <0.268.0> Applying MFA: M = rabbit_direct, F = boot, A = []
2020-06-14 04:46:58.759 [debug] <0.518.0> HTTP client proxy is not configured
2020-06-14 04:46:58.759 [info] <0.520.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 30 seconds.
2020-06-14 04:46:58.810 [info] <0.434.0> rabbit on node 'rab...@rabbitmq-2.rabbitmq.rabbitmq-ns.svc.cluster.local' up
2020-06-14 04:46:58.849 [debug] <0.541.0> Supervisor rabbit_mgmt_agent_sup_sup started rabbit_mgmt_agent_sup:start_link() at pid <0.542.0>
2020-06-14 04:46:58.937 [debug] <0.536.0> Starting HTTP[S] listener with transport ranch_tcp, options [{port,15672}] and protocol options #{}, stream handlers [rabbit_cowboy_stream_h,cowboy_compress_h,cowboy_stream_h]
2020-06-14 04:46:58.939 [info] <0.582.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2020-06-14 04:46:58.939 [info] <0.688.0> Statistics database started.
2020-06-14 04:46:58.939 [debug] <0.686.0> Supervisor rabbit_mgmt_sup_sup started rabbit_mgmt_sup:start_link() at pid <0.687.0>
2020-06-14 04:46:59.337 [info] <0.8.0> Server startup complete; 5 plugins started.
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * rabbitmq_peer_discovery_k8s
 * rabbitmq_peer_discovery_common
 completed with 5 plugins.
2020-06-14 04:47:28.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:47:28.764 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:47:58.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:47:58.835 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:48:28.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:48:28.833 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:48:58.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:48:58.835 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:49:28.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:49:28.764 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:49:58.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:49:58.764 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:50:28.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:50:28.764 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.
2020-06-14 04:50:58.761 [debug] <0.520.0> Peer discovery: checking for partitioned nodes to clean up.
2020-06-14 04:50:58.833 [debug] <0.520.0> Peer discovery: all known cluster nodes are up.

that is full log after crashed pod rabbitmq-2 restart, but restart not expected

在 2020年6月5日星期五 UTC+8上午4:13:45，Michael Klishin写道：

[{nodes,[{disc,['rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-2.rabbitmq.local-fast.svc.cluster.local']}]},

{running_nodes,['rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',

'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local']},

{cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local">>},

{partitions,[]},

{alarms,[{'rabbit@rabbitmq-1.rabbitmq.local-fast.svc.cluster.local',[]},

{'rab...@rabbitmq-0.rabbitmq.local-fast.svc.cluster.local',[]}]}]

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/446f942b-8767-4685-87a6-c22be48670b7%40googlegroups.com.

Reply all

Reply to author

Forward