Is reload-agents known to be unreliable, if mcollective has lost its STOMP connection?
When I run "/etc/init.d/mcollective reload-agents", it sends a USR1 signal to mcollectived
to cause it to reload its agents.
Usually, this works fine. But if I do this when the mcollectived has lost its STOMP connection
(because I restart RabbitMQ server at around the same time) the results are unreliable. It may
work okay, or it may leave mcollectived with some missing agents/plugins. For example,
here is a fragment from /var/log/mcollective.log during a failure case:
I, [2015-11-18T15:15:23.735468 #11689] INFO -- : rabbitmq.rb:15:in `on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:50.982806 #11689] ERROR -- : rabbitmq.rb:30:in `on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: es_recv: connection.receive returning EOF as nil - resetting connection.
I, [2015-11-18T15:19:50.985885 #11689] INFO -- : rabbitmq.rb:10:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:50.993417 #11689] INFO -- : rabbitmq.rb:25:in `on_connectfail' TCP Connection to stomp://mcollective@ms1:61613 failed on attempt 0
I, [2015-11-18T15:19:56.398467 #11689] INFO -- : runner.rb:24:in `initialize' Reloading all agents after receiving USR1 signal
E, [2015-11-18T15:19:56.400925 #11689] ERROR -- : rabbitmq.rb:30:in `on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: es_oldrecv: receive failed: Stomp::Error::NoCurrentConnection
I, [2015-11-18T15:19:56.401329 #11689] INFO -- : rabbitmq.rb:10:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:56.444731 #11689] INFO -- : rabbitmq.rb:15:in `on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:57.778045 #11689] ERROR -- : agents.rb:138:in `dispatch' Execution of rpcutil failed: No plugin rpcutil_agent defined
E, [2015-11-18T15:19:57.778889 #11689] ERROR -- : agents.rb:139:in `dispatch' /usr/lib/ruby/site_ruby/1.8/mcollective/pluginmanager.rb:73:in `[]'
In that case, I restarted the RabbitMQ server, ran "/etc/init.d/mcollective reload-agents"
and then ran an mco command that tried to use the rpc_util agent.
You'll notice that after it had supposedly reloaded all agents, mcollective seemed to
no longer have the "rpcutil_agent" plugin. This situation persisted until I ran
reload-agents again.
Has anyone seen anything like this? Is this a known bug?
I can't find an existing bug for this. There was an old one, way back, where the process
There is another unrelated ticket, where the first comment mentions that the USR1 handling
"doesn't work too well anyway because ruby":
https://tickets.puppetlabs.com/browse/MCO-328
Any suggestions? Is there some way I can work around this?
Thanks in advance, for any ideas or information on this.