Rabbitmq cluster fails to start because of shovel plugin

809 views
Skip to first unread message

Sergey Kryazhev

unread,
Aug 6, 2019, 8:18:53 AM8/6/19
to rabbitmq-users
We have a rabbitmq 3.7.7 cluster (3 nodes) deployed in Openshift.
We have faced start failure issue after successfully stop  (needed to make a backup procudure).
The plugins installed [rabbitmq_management,rabbitmq_shovel,rabbitmq_shovel_management].
2 from 3 nodes were not able to start with the following errors:
2019-08-04 01:18:39.191 [error] <0.9026.297> \nError description:\n    rpc:'-handle_call_call/6-fun-0-'/5 line 197\n    rabbit:start_it/1 line 450\n    rabbit:broker_start/0 line 326\n    rabbit:start_apps/2 line 546\n    app_utils:manage_applications/6 line 126\n    lists:foldl/3 line 1263\n    rabbit:'-handle_app_error/1-fun-0-'/3 line 642\nthrow:{could_not_start,rabbitmq_shovel,\n       {rabbitmq_shovel,\n        {{shutdown,\n          {failed_to_start_child,rabbit_shovel_dyn_worker_sup_sup,\n           {'EXIT',\n            {{badmatch,\n              {'EXIT',\n               {{noproc,\n                 {gen_server,call,[<9540.1035.0>,which_children,infinity]}},\n                {gen_server2,call,\n                 [<0.10397.297>,{init,<0.10388.297>},infinity]}}}},\n             [{rabbit_shovel_dyn_worker_sup_sup,start_link,0,\n               [{file,""src/rabbit_shovel_dyn_worker_sup_sup.erl""},{line,29}]},\n              {supervisor2,do_start_child,2,\n               [{file,""src/supervisor2.erl""},{line,375}]},\n              {supervisor2,start_children,3,\n               [{file,""src/supervisor2.erl""},{line,358}]},\n              {supervisor2,init_children,2,\n               [{file,""src/supervisor2.erl""},{line,324}]},\n              {gen_server,init_it,2,[{file,""gen_server.erl""},{line,365}]},\n              {gen_server,init_it,6,[{file,""gen_server.erl""},{line,333}]},\n              {proc_lib,init_p_do_apply,3,\n               [{file,""proc_lib.erl""},{line,247}]}]}}}},\n         {rabbit_shovel,start,[normal,[]]}}}}\nLog file(s) (may contain more information):\n   <stdout>



and
2019-08-04 01:30:29.857 [error] <0.18166.4361> CRASH REPORT Process <0.18166.4361> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,rabbit_shovel_dyn_worker_sup_sup,{'EXIT',{{badmatch,{'EXIT',{{noproc,{gen_server,call,[<9540.686.0>,which_children,infinity]}},{gen_server2,call,[<0.18167.4361>,{init,<0.18176.4361>},infinity]}}}},[{rabbit_shovel_dyn_worker_sup_sup,start_link,0,[{file,""src/rabbit_shovel_dyn_worker_sup_sup.erl""},{line,29}]},{supervisor2,do_start_child,2,[{file,""src/supervisor2.erl""},{line,375}]},{supervisor2,start_children,3,[{file,""src/supervisor2.erl""},{line,358}]},{supervisor2,...},...]}}}},...} in application_master:init/4 line 134



Please assist  with possible root cause.
Dous this issue may cause it https://github.com/rabbitmq/rabbitmq-shovel/issues/46 ?
Do we mandatory need to upgrade rabbitmq to the latest version ?

Thank you

Sergey Kryazhev

unread,
Aug 6, 2019, 8:26:28 AM8/6/19
to rabbitmq-users
More detailed stack during boot
BOOT FAILED
===========

Error description:

    rpc
:'-handle_call_call/6-fun-0-'/5 line 197

    rabbit
:start_it/1 line 450
    rabbit
:broker_start/0 line 326
    rabbit
:start_apps/2 line 546
    app_utils
:manage_applications/6 line 126
    lists
:foldl/3 line 1263

    rabbit
:'-handle_app_error/1-fun-0-'/3 line 642
throw:{could_not_start,rabbitmq_shovel,
       
{rabbitmq_shovel,

       
{{shutdown,
         
{failed_to_start_child,rabbit_shovel_dyn_worker_sup_sup,
           
{'EXIT',
           
{{badmatch,
             
{'EXIT',
               
{{noproc,

                 
{gen_server,call,[<9540.1035.0>,which_children,infinity]}},
               
{gen_server2,call,

                 
[<0.10397.297>,{init,<0.10388.297>},infinity]}}}},

             
[{rabbit_shovel_dyn_worker_sup_sup,start_link,0,
               
[{file,"src/rabbit_shovel_dyn_worker_sup_sup.erl"},{line,29}]},
             
{supervisor2,do_start_child,2,
               
[{file,"src/supervisor2.erl"},{line,375}]},
             
{supervisor2,start_children,3,
               
[{file,"src/supervisor2.erl"},{line,358}]},

             
{supervisor2,init_children,2,

               
[{file,"src/supervisor2.erl"},{line,324}]},

             
{gen_server,init_it,2,[{file,"gen_server.erl"},{line,365}]},

             
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,333}]},

             
{proc_lib,init_p_do_apply,3,

               
[{file,"proc_lib.erl"},{line,247}]}]}}}},

         
{rabbit_shovel,start,[normal,[]]}}}}


Karl Nilsson

unread,
Aug 7, 2019, 4:39:18 AM8/7/19
to rabbitm...@googlegroups.com
looks like the mirrored supervisor is struggling to start due to a missing process. Can you post the full log for the start attempt?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/e5c06dd0-a3de-49d5-92be-b58ac5ececc4%40googlegroups.com.


--
Karl Nilsson

Pivotal/RabbitMQ

Sergey Kryazhev

unread,
Aug 7, 2019, 6:17:29 AM8/7/19
to rabbitmq-users
Hi,

Thanks for reply.

Attached below all we have now. Let me specify scenario:
1. Stop sequentally rabbit nodes using rabbitmqctl stop_app
2. Making backup (just copy datafiles)
3. Start simulteniously all 3 nodes using rabbitmqctl start_app

[INFO/MainProcess] starting pods in parallel...
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-2] STDOUT: Starting node rabbit@rmqlocal-1-0 ...

[INFO/PoolWorker-1] STDOUT: Starting node rabbit@rmqlocal-0-0 ...

[INFO/PoolWorker-3] STDOUT: Starting node rabbit@rmqlocal-2-0 ...

[INFO/PoolWorker-1] STDOUT:  completed with 6 plugins.

[INFO/PoolWorker-2] STDOUT:
BOOT FAILED
Log file(s) (may contain more information):

   
<stdout>


[INFO/PoolWorker-3] STDOUT:
BOOT FAILED
===========

Error description:
    rpc
:'-handle_call_call/6-fun-0-'/5 line 197
    rabbit
:start_it/1 line 450
    rabbit
:broker_start/0 line 326
    rabbit
:start_apps/2 line 546
    app_utils
:manage_applications/6 line 126
    lists
:foldl/3 line 1263
    rabbit
:'-handle_app_error/1-fun-0-'/3 line 642
throw:{could_not_start,rabbitmq_shovel,
       
{rabbitmq_shovel,
       
{{shutdown,
         
{failed_to_start_child,rabbit_shovel_dyn_worker_sup_sup,
           
{'EXIT',
           
{{badmatch,
             
{'EXIT',
               
{{noproc,

                 
{gen_server,call,[<9540.686.0>,which_children,infinity]}},
               
{gen_server2,call,
                 
[<0.16988.4361>,{init,<0.16997.4361>},infinity]}}}},

             
[{rabbit_shovel_dyn_worker_sup_sup,start_link,0,
               
[{file,"src/rabbit_shovel_dyn_worker_sup_sup.erl"},{line,29}]},
             
{supervisor2,do_start_child,2,
               
[{file,"src/supervisor2.erl"},{line,375}]},
             
{supervisor2,start_children,3,
               
[{file,"src/supervisor2.erl"},{line,358}]},
             
{supervisor2,init_children,2,
               
[{file,"src/supervisor2.erl"},{line,324}]},
             
{gen_server,init_it,2,[{file,"gen_server.erl"},{line,365}]},
             
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,333}]},
             
{proc_lib,init_p_do_apply,3,
               
[{file,"proc_lib.erl"},{line,247}]}]}}}},
         
{rabbit_shovel,start,[normal,[]]}}}}
Log file(s) (may contain more information):

   
<stdout>


[INFO/PoolWorker-2] STDERR: Error:
{:could_not_start, :rabbitmq_shovel, {:rabbitmq_shovel, {{:shutdown, {:failed_to_start_child, :rabbit_shovel_dyn_worker_sup_sup, {:EXIT, {{:badmatch, {:EXIT, {{:noproc, {:gen_server, :call, [#PID<9777.1035.0>, :which_children, :infinity]}}, {:gen_server2, :call, [#PID<9336.10397.297>, {:init, #PID<9336.10388.297>}, :infinity]}}}}, [{:rabbit_shovel_dyn_worker_sup_sup, :start_link, 0, [file: 'src/rabbit_shovel_dyn_worker_sup_sup.erl', line: 29]}, {:supervisor2, :do_start_child, 2, [file: 'src/supervisor2.erl', line: 375]}, {:supervisor2, :start_children, 3, [file: 'src/supervisor2.erl', line: 358]}, {:supervisor2, :init_children, 2, [file: 'src/supervisor2.erl', line: 324]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 365]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 333]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}}, {:rabbit_shovel, :start, [:normal, []]}}}}

[INFO/PoolWorker-3] STDERR: Error:
{:could_not_start, :rabbitmq_shovel, {:rabbitmq_shovel, {{:shutdown, {:failed_to_start_child, :rabbit_shovel_dyn_worker_sup_sup, {:EXIT, {{:badmatch, {:EXIT, {{:noproc, {:gen_server, :call, [#PID<9777.686.0>, :which_children, :infinity]}}, {:gen_server2, :call, [#PID<9336.16988.4361>, {:init, #PID<9336.16997.4361>}, :infinity]}}}}, [{:rabbit_shovel_dyn_worker_sup_sup, :start_link, 0, [file: 'src/rabbit_shovel_dyn_worker_sup_sup.erl', line: 29]}, {:supervisor2, :do_start_child, 2, [file: 'src/supervisor2.erl', line: 375]}, {:supervisor2, :start_children, 3, [file: 'src/supervisor2.erl', line: 358]}, {:supervisor2, :init_children, 2, [file: 'src/supervisor2.erl', line: 324]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 365]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 333]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}}, {:rabbit_shovel, :start, [:normal, []]}}}}

[INFO/PoolWorker-1] process shutting down
[INFO/PoolWorker-2] process shutting down
[INFO/PoolWorker-3] process shutting down
[INFO/PoolWorker-1] process exiting with exitcode 0
[INFO/PoolWorker-3] process exiting with exitcode 0
[INFO/PoolWorker-2] process exiting with exitcode 0
[INFO/MainProcess] starting pods completed.
[INFO/MainProcess] removing backup flag file for pod rmqlocal-0-0
[INFO/MainProcess] removing backup flag file for pod rmqlocal-1-0
[INFO/MainProcess] removing backup flag file for pod rmqlocal-2-0

There are errors like
Channel error on connection <0.1301.2589> (172.20.22.1:52180 -> 172.20.10.144:5672, vhost: '/', user: 'rabProjectprod'), channel 1:\noperation queue.bind caused a channel exception not_found: no binding CANCELLED between exchange 'prod-quote-state-change-v2' in vhost '/' and queue 'prod.company.cancel-or-expire-v2' in vhost '/'"    


"
2019-08-04 01:18:39.174 [error] <0.6872.2588> Error on AMQP connection <0.6872.2588> (172.20.30.1:36234 -> 172.20.10.144:5672 - rabbitConnectionFactory#8c7347:63, vhost: '/', user: 'rabProjectprod', state: running), channel 1:\n {{shutdown,{gen_server,call,[<9541.9324.297>,which_children,infinity]}},\n [{gen_server,call,3,[{file,""gen_server.erl""},{line,214}]},\n  {supervisor2,find_child,2,[{file,""src/supervisor2.erl""},{line,254}]},\n  {rabbit_amqqueue_sup_sup,find_for_vhost,2,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,61}]},\n  {rabbit_amqqueue_sup_sup,start_queue_process,3,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,45}]},\n  {rabbit_amqqueue,declare,7,[{file,""src/rabbit_amqqueue.erl""},{line,348}]},\n  {rabbit_channel,handle_method,5,\n                  [{file,""src/rabbit_channel.erl""},{line,2167}]},\n  {rabbit_channel,handle_method,3,\n                  [{file,""src/rabbit_channel.erl""},{line,1404}]},\n  {rabbit_channel,handle_cast,2,\n                  [{file,""src/rabbit_channel.erl""},{line,527}]}]}"

   
2019-08-04T01:18:39.000Z,"logging-fluentd-9q686","2019-08-04 01:18:39.175 [error] <0.6860.2588> Error on AMQP connection <0.6860.2588> (172.20.22.1:55768 -> 172.20.10.144:5672 - ProjectConnectionFactory#16dff96:12130, vhost: '/', user: 'rabProjectprod', state: running), channel 1:\n {{shutdown,{gen_server,call,[<9541.9324.297>,which_children,infinity]}},\n [{gen_server,call,3,[{file,""gen_server.erl""},{line,214}]},\n  {supervisor2,find_child,2,[{file,""src/supervisor2.erl""},{line,254}]},\n  {rabbit_amqqueue_sup_sup,find_for_vhost,2,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,61}]},\n  {rabbit_amqqueue_sup_sup,start_queue_process,3,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,45}]},\n  {rabbit_amqqueue,declare,7,[{file,""src/rabbit_amqqueue.erl""},{line,348}]},\n  {rabbit_channel,handle_method,5,\n                  [{file,""src/rabbit_channel.erl""},{line,2167}]},\n  {rabbit_channel,handle_method,3,\n                  [{f

Error:
{:could_not_start, :rabbitmq_shovel, {:rabbitmq_shovel, {{:shutdown, {:failed_to_start_child, :rabbit_shovel_dyn_worker_sup_sup, {:EXIT, {{:badmatch, {:EXIT, {{:noproc, {:gen_server, :call, [#PID<9777.1035.0>, :which_children, :infinity]}}, {:gen_server2, :call, [#PID<9336.10397.297>, {:init, #PID<9336.10388.297>}, :infinity]}}}}, [{:rabbit_shovel_dyn_worker_sup_sup, :start_link, 0, [file: 'src/rabbit_shovel_dyn_worker_sup_sup.erl', line: 29]}, {:supervisor2, :do_start_child, 2, [file: 'src/supervisor2.erl', line: 375]}, {:supervisor2, :start_children, 3, [file: 'src/supervisor2.erl', line: 358]}, {:supervisor2, :init_children, 2, [file: 'src/supervisor2.erl', line: 324]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 365]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 333]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}}, {:rabbit_shovel, :start, [:normal, []]}}}}


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.


--
Karl Nilsson

Pivotal/RabbitMQ

Karl Nilsson

unread,
Aug 7, 2019, 6:24:00 AM8/7/19
to rabbitm...@googlegroups.com
I'd still like to see the full rabbitmq server logs

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/162e32eb-e67f-4272-9102-cbe50632bde8%40googlegroups.com.


--
Karl Nilsson

Pivotal/RabbitMQ

Sergey Kryazhev

unread,
Aug 7, 2019, 6:24:09 AM8/7/19
to rabbitmq-users
Adding logs


Error:
{:could_not_start, :rabbitmq_shovel, {:rabbitmq_shovel, {{:shutdown, {:failed_to_start_child, :rabbit_shovel_dyn_worker_sup_sup, {:EXIT, {{:badmatch, {:EXIT, {{:noproc, {:gen_server, :call, [#PID<9777.1035.0>, :which_children, :infinity]}}, {:gen_server2, :call, [#PID<9336.10397.297>, {:init, #PID<9336.10388.297>}, :infinity]}}}}, [{:rabbit_shovel_dyn_worker_sup_sup, :start_link, 0, [file: 'src/rabbit_shovel_dyn_worker_sup_sup.erl', line: 29]}, {:supervisor2, :do_start_child, 2, [file: 'src/supervisor2.erl', line: 375]}, {:supervisor2, :start_children, 3, [file: 'src/supervisor2.erl', line: 358]}, {:supervisor2, :init_children, 2, [file: 'src/supervisor2.erl', line: 324]}, {:gen_server, :init_it, 2, [file: 'gen_server.erl', line: 365]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 333]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}}, {:rabbit_shovel, :start, [:normal, []]}}}}


Channel error on connection <0.1301.2589> (172.20.22.1:52180 -> 172.20.10.144:5672, vhost: '/', user: 'rabProjectprod'), channel 1:\noperation queue.bind caused a channel exception not_found: no binding CANCELLED between exchange 'prod-quote-state-change-v2' in vhost '/' and queue 'prod.Project-Project.cancel-or-expire-v2' in vhost '/'"    

"
2019-08-04 01:18:39.174 [error] <0.6872.2588> Error on AMQP connection <0.6872.2588> (172.20.30.1:36234 -> 172.20.10.144:5672 - rabbitConnectionFactory#8c7347:63, vhost: '/', user: 'rabProjectprod', state: running), channel 1:\n {{shutdown,{gen_server,call,[<9541.9324.297>,which_children,infinity]}},\n [{gen_server,call,3,[{file,""gen_server.erl""},{line,214}]},\n  {supervisor2,find_child,2,[{file,""src/supervisor2.erl""},{line,254}]},\n  {rabbit_amqqueue_sup_sup,find_for_vhost,2,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,61}]},\n  {rabbit_amqqueue_sup_sup,start_queue_process,3,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,45}]},\n  {rabbit_amqqueue,declare,7,[{file,""src/rabbit_amqqueue.erl""},{line,348}]},\n  {rabbit_channel,handle_method,5,\n                  [{file,""src/rabbit_channel.erl""},{line,2167}]},\n  {rabbit_channel,handle_method,3,\n                  [{file,""src/rabbit_channel.erl""},{line,1404}]},\n  {rabbit_channel,handle_cast,2,\n                  [{file,""src/rabbit_channel.erl""},{line,527}]}]}"    
2019-08-04T01:18:39.000Z,"logging-fluentd-9q686","2019-08-04 01:18:39.175 [error] <0.6860.2588> Error on AMQP connection <0.6860.2588> (172.20.22.1:55768 -> 172.20.10.144:5672 - ProjectProjectConnectionFactory#16dff96:12130, vhost: '/', user: 'rabProjectprod', state: running), channel 1:\n {{shutdown,{gen_server,call,[<9541.9324.297>,which_children,infinity]}},\n [{gen_server,call,3,[{file,""gen_server.erl""},{line,214}]},\n  {supervisor2,find_child,2,[{file,""src/supervisor2.erl""},{line,254}]},\n  {rabbit_amqqueue_sup_sup,find_for_vhost,2,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,61}]},\n  {rabbit_amqqueue_sup_sup,start_queue_process,3,\n                           [{file,""src/rabbit_amqqueue_sup_sup.erl""},\n                            {line,45}]},\n  {rabbit_amqqueue,declare,7,[{file,""src/rabbit_amqqueue.erl""},{line,348}]},\n  {rabbit_channel,handle_method,5,\n                  [{file,""src/rabbit_channel.erl""},{line,2167}]},\n  {rabbit_channel,handle_method,3,\n                  [{f



               
[{file,<span st

Sergey Kryazhev

unread,
Aug 8, 2019, 11:13:43 AM8/8/19
to rabbitmq-users
Hi,
I have attached archive with 3 logs per each node, during start up issue
2019-08-04T01:30:*

Please assist if possible.


--
Karl Nilsson

Pivotal/RabbitMQ

Sergey Kryazhev

unread,
Aug 9, 2019, 8:44:53 AM8/9/19
to rabbitmq-users
rabbit.zip

Sergey Kryazhev

unread,
Aug 13, 2019, 8:51:30 AM8/13/19
to rabbitmq-users
Sorry , does provided logs sufficient  ?
Thank you

On Wednesday, August 7, 2019 at 1:24:00 PM UTC+3, Karl Nilsson wrote:


--
Karl Nilsson

Pivotal/RabbitMQ

Michael Klishin

unread,
Oct 3, 2019, 6:17:34 AM10/3/19
to rabbitmq-users
All this tells us is that a Shovel failed to start because a supervisor process does not exist.

My best guess is that Shovels in your system failed so many times in a short period of time that it crossed
the supervisor restart threshold, the supervisor then shut down and Shovels cannot be restarted.

In modern versions the rate has to be above 100 failures  a second IIRC.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/adf0c020-c2ea-4b20-98aa-46616045443a%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages