mcollective reload-agents unreliable?

141 views
Skip to first unread message

Lorcan Hamill

unread,
Nov 18, 2015, 7:12:12 PM11/18/15
to Puppet Users
Is reload-agents known to be unreliable, if mcollective has lost its STOMP connection?

Let me explain...:

When I run "/etc/init.d/mcollective reload-agents", it sends a USR1 signal to mcollectived
to cause it to reload its agents.

Usually, this works fine.   But if I do this when the mcollectived has lost its STOMP connection
(because I restart RabbitMQ server at around the same time) the results are unreliable.  It may
work okay, or it may leave mcollectived with some missing agents/plugins.  For example,
here is a fragment from /var/log/mcollective.log during a failure case:

I, [2015-11-18T15:15:23.735468 #11689]  INFO -- : rabbitmq.rb:15:in `on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:50.982806 #11689] ERROR -- : rabbitmq.rb:30:in `on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: es_recv: connection.receive returning EOF as nil - resetting connection.
I, [2015-11-18T15:19:50.985885 #11689]  INFO -- : rabbitmq.rb:10:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:50.993417 #11689]  INFO -- : rabbitmq.rb:25:in `on_connectfail' TCP Connection to stomp://mcollective@ms1:61613 failed on attempt 0
I, [2015-11-18T15:19:56.398467 #11689]  INFO -- : runner.rb:24:in `initialize' Reloading all agents after receiving USR1 signal
E, [2015-11-18T15:19:56.400925 #11689] ERROR -- : rabbitmq.rb:30:in `on_miscerr' Unexpected error on connection stomp://mcollective@ms1:61613: es_oldrecv: receive failed: Stomp::Error::NoCurrentConnection
I, [2015-11-18T15:19:56.401329 #11689]  INFO -- : rabbitmq.rb:10:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@ms1:61613
I, [2015-11-18T15:19:56.444731 #11689]  INFO -- : rabbitmq.rb:15:in `on_connected' Connected to stomp://mcollective@ms1:61613
E, [2015-11-18T15:19:57.778045 #11689] ERROR -- : agents.rb:138:in `dispatch' Execution of rpcutil failed: No plugin rpcutil_agent defined
E, [2015-11-18T15:19:57.778889 #11689] ERROR -- : agents.rb:139:in `dispatch' /usr/lib/ruby/site_ruby/1.8/mcollective/pluginmanager.rb:73:in `[]'

In that case, I restarted the RabbitMQ server, ran "/etc/init.d/mcollective reload-agents"
and then ran an mco command that tried to use the rpc_util agent.

You'll notice that after it had supposedly reloaded all agents, mcollective seemed to
no longer have the "rpcutil_agent" plugin.  This situation persisted until I ran
reload-agents again.

Has anyone seen anything like this?  Is this a known bug?

I can't find an existing bug for this. There was an old one, way back, where the process
actually died in similar circumstances: https://projects.puppetlabs.com/issues/8753

There is another unrelated ticket, where the first comment mentions that the USR1 handling
"doesn't work too well anyway because ruby": https://tickets.puppetlabs.com/browse/MCO-328

Any suggestions?   Is there some way I can work around this?

Thanks in advance, for any ideas or information on this.

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

R.I.Pienaar

unread,
Nov 18, 2015, 7:28:36 PM11/18/15
to puppet-users


----- Original Message -----
> From: "Lorcan Hamill" <lorcan...@ammeon.com>
> To: "puppet-users" <puppet...@googlegroups.com>
> Sent: Wednesday, November 18, 2015 7:02:31 PM
> Subject: [Puppet Users] mcollective reload-agents unreliable?

> Is reload-agents known to be unreliable, if mcollective has lost its STOMP
> connection?

yes, I actually thought that feature got removed since it's not usuable and
never really worked at all.

Don't use it.
> --
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to puppet-users...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-users/3da97f6f-15fb-4248-b2ff-c9d0fb670937%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Lorcan Hamill

unread,
Nov 19, 2015, 10:41:34 AM11/19/15
to Puppet Users
Thanks for the quick reply!

yes, I actually thought that feature got removed since it's not usuable and
never really worked at all.

Don't use it.

It seems to work okay most of the time, as far as I can tell. The trouble comes
if it is used when the daemon is:

1) Still getting initialised
2) Trying to re-connect to the rabbitmq server.

At either of those times, it seems unreliable.

Do you have any suggestion for what we could do instead, to cause new agents
to come into effect?  We used "restart" in the past, but that caused other issues
due to the way that in-flight RPC calls get discarded.

If there were some way of detecting when the reload-agents doesn't work properly
we could then re-try it.  But I can't see a way of detecting the problem (except
by looking for errors later on, which is hard to automate...)

Again, thanks.

R.I.Pienaar

unread,
Nov 19, 2015, 11:01:24 AM11/19/15
to puppet-users


----- Original Message -----
> From: "Lorcan Hamill" <lorcan...@ammeon.com>
> To: "puppet-users" <puppet...@googlegroups.com>
> Sent: Thursday, November 19, 2015 10:41:34 AM
> Subject: Re: [Puppet Users] mcollective reload-agents unreliable?

> Thanks for the quick reply!
>
> yes, I actually thought that feature got removed since it's not usuable and
>> never really worked at all.
>>
>> Don't use it.
>>
>
> It seems to work okay most of the time, as far as I can tell. The trouble
> comes if it is used when the daemon is:

it really doesn't, in ways you cant even see or debug and cannot work. Don't use it.

Only option is to restart the daemon.

Lorcan Hamill

unread,
Nov 19, 2015, 11:03:51 AM11/19/15
to Puppet Users
it really doesn't, in ways you cant even see or debug and cannot work. Don't use it.

Only option is to restart the daemon.

Okay, we'll just have to figure out a way to live with that, I guess. 

Thanks.
Reply all
Reply to author
Forward
0 new messages