Unable to add or remove nodes from RabbitMQ cluster

1,105 views
Skip to first unread message

Nick

unread,
Feb 21, 2017, 7:15:56 PM2/21/17
to rabbitmq-users
Hi All,

I recently inherited a older 3 node rabbitmq cluster (3.2.4) that has some issues. One of the nodes went down and was completely unrecoverable, all attempts to remove this from the cluster using the following command hangs.
rabbitmqctl forget_cluster_node --offline <node>

I built a new node, which I was able to get it online but it failed quickly due to the hardware it was running on going down. Can't get that one removed either with the same command.

Built a second node, which won't join the cluster at all, just hangs.

-bash-4.1$ rabbitmqctl join_cluster rabbit@prod-messaging20
Clustering node 'rabbit@prod-messaging29' with 'rabbit@prod-messaging20' ...

Doing some research, I tried to run
rabbitmqctl report
and it hangs on both of my nodes at the channels section.

Digging thru logs, I see the following errors, not sure how to interpret them.

RAM node errors

rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.21966.189>,#Ref<0.0.10610.90382>},{basic_get,<0.21966.189>,false,<0.21965.189>}} from <0.21966.189> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.23095.3>,#Ref<0.0.10610.90910>},stat} from <0.23095.3> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.26279.11>,#Ref<0.0.10610.90932>},stat} from <0.26279.11> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.21966.189>,#Ref<0.0.10610.91103>},{basic_get,<0.21966.189>,false,<0.21965.189>}} from <0.21966.189> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.8960.13>,#Ref<0.0.10610.91438>},stat} from <0.8960.13> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.26279.11>,#Ref<0.0.10610.91840>},stat} from <0.26279.11> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.21966.189>,#Ref<0.0.10610.92054>},{basic_get,<0.21966.189>,false,<0.21965.189>}} from <0.21966.189> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.30076.11>,#Ref<0.0.10610.92227>},stat} from <0.30076.11> to <0.4950.0> in an old incarnation (1) of this node (2)
rabbit@prod
-messaging21.log:Discarding message {'$gen_call',{<0.26279.11>,#Ref<0.0.10610.92270>},stat} from <0.26279.11> to <0.4950.0> in an old incarnation (1) of this node (2)

=SUPERVISOR REPORT==== 21-Feb-2017::18:45:25 ===
     
Supervisor: {<0.27432.5557>,
                                           rabbit_stomp_client_sup
}
     
Context:    shutdown_error
     
Reason:     noproc
     
Offender:   [{pid,<0.27438.5557>},
                 
{name,rabbit_stomp_reader},
                 
{mfargs,
                     
{rabbit_stomp_reader,start_link,
                         
[<0.27435.5557>,<0.27436.5557>,
                           
{stomp_configuration,"guest","guest",false,
                               
false}]}},
                 
{restart_type,transient},
                 
{shutdown,4294967295},
                 
{child_type,worker}]


=SUPERVISOR REPORT==== 21-Feb-2017::18:53:03 ===
     
Supervisor: {<0.10211.5558>,
                                           rabbit_stomp_client_sup
}
     
Context:    shutdown_error
     
Reason:     noproc
     
Offender:   [{pid,<0.10216.5558>},
                 
{name,rabbit_stomp_reader},
                 
{mfargs,
                     
{rabbit_stomp_reader,start_link,
                         
[<0.10213.5558>,<0.10215.5558>,
                           
{stomp_configuration,"guest","guest",false,
                               
false}]}},
                 
{restart_type,transient},
                 
{shutdown,4294967295},
                 
{child_type,worker}]


=SUPERVISOR REPORT==== 21-Feb-2017::19:03:45 ===
     
Supervisor: {<0.370.5559>,
                                           rabbit_stomp_client_sup
}
     
Context:    shutdown_error
     
Reason:     noproc
     
Offender:   [{pid,<0.375.5559>},
                 
{name,rabbit_stomp_reader},
                 
{mfargs,
                     
{rabbit_stomp_reader,start_link,
                         
[<0.372.5559>,<0.374.5559>,
                           
{stomp_configuration,"guest","guest",false,
                               
false}]}},
                 
{restart_type,transient},
                 
{shutdown,4294967295},
                 
{child_type,worker}]

DISC node errors

=WARNING REPORT==== 19-Feb-2017::00:00:21 ===
Mnesia('rabbit@prod-messaging20'): ** WARNING ** Mnesia is overloaded: {dump_log,
                                                                        time_threshold
}e...


I've considered doing a orderly shutdown and restart of the two working nodes (disc and ram), one at a time to see if that might clear the issues i'm having.

Has anyone experienced this before?

Thanks in advance =)

[nacho@prod-messaging21.ma01 rabbitmq]# ulimit -a
core file size          
(blocks, -c) 0
data seg size          
(kbytes, -d) unlimited
scheduling priority            
(-e) 0
file size              
(blocks, -f) unlimited
pending signals                
(-i) 96026
max locked memory      
(kbytes, -l) 64
max memory size        
(kbytes, -m) unlimited
open files                      
(-n) 65536
pipe size            
(512 bytes, -p) 8
POSIX message queues    
(bytes, -q) 819200
real
-time priority              (-r) 0
stack size              
(kbytes, -s) 10240
cpu time              
(seconds, -t) unlimited
max user processes              
(-u) 10000
virtual memory          (kbytes, -v) unlimited
file locks                      
(-x) unlimited


Michael Klishin

unread,
Feb 21, 2017, 7:40:39 PM2/21/17
to rabbitm...@googlegroups.com
None of the errors in the log have anything directly to do with stuck channels.
Restarting nodes should help but 3.2.x has been out of any kind of support for a few years now. Please upgrade to 3.6.6.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages