Is this the right fix for PUP-1106?

Andy Parker

unread,

Sep 17, 2014, 1:12:31 PM9/17/14

to puppe...@googlegroups.com

Adrien has put together a change to address PUP-1106 (failed dependencies are not honored on refresh), however this is a really thorny issue and I'd like to get some more eyes on it before committing to the change.

I think the original bug in redmine has the most history (http://projects.puppetlabs.com/issues/5876) and unfortunately it doesn't look like there was a clear decision about what the correct behavior is. I think John's description of the semantics are pretty good, but I find it a little hard to unravel it into what needs to change.

Adrien's solution is to make it so that a resource *does not process refresh events* when it is being skipped. Skipping can happen because of failed dependencies or because the resource isn't scheduled. I'm going to keep looking over it and will try it out some, but I think others should kick these tires a bit as well.

--

Andrew Parker

an...@puppetlabs.com

Freenode: zaphod42

Twitter: @aparker42

Software Developer

Join us at PuppetConf 2014, September 20-24 in San Francisco - www.puppetconf.com

Nick Fagerlund

unread,

Sep 17, 2014, 2:56:00 PM9/17/14

to puppe...@googlegroups.com

I think I'm in favor of this. I can imagine some case where this isn't the behavior you want, but it has a major advantage over any other behavior: it's explainable and intuitive. Behaving predictably is a feature!

Felix Frank

unread,

Sep 17, 2014, 3:00:39 PM9/17/14

to puppe...@googlegroups.com

On 09/17/2014 07:12 PM, Andy Parker wrote:
> Adrien has put together a change to address PUP-1106 (failed
> dependencies are not honored on refresh), however this is a really
> thorny issue and I'd like to get some more eyes on it before
> committing to the change.
>
> I think the original bug in redmine has the most history
> (http://projects.puppetlabs.com/issues/5876) and unfortunately it
> doesn't look like there was a clear decision about what the correct
> behavior is. I think John's description of the semantics are pretty
> good, but I find it a little hard to unravel it into what needs to change.
>
> Adrien's solution is to make it so that a resource *does not process
> refresh events* when it is being skipped. Skipping can happen because
> of failed dependencies or because the resource isn't scheduled. I'm
> going to keep looking over it and will try it out some, but I think
> others should kick these tires a bit as well.

Oh, the nostalgia.

The issue got somewhat watered down back in the day, with questions
about the semantics of refreshonly execs especially when they don't get
triggered. I don't think this belongs into this discussion at all and
can be regarded after the refresh madness itself is sorted out.

Which this fix will. I think.

There are other exciting cases in there, such as "is a resource failed
if it's in sync but the refresh fails" and so forth. I feel that those
are more relevant, but should probably be handled on their own. This
should be possible, and I don't see that it would be contradictory for
the proposed design. In other words, fixes for related issues can likely
be implemented independently of merging this one.

Specifically, one of the more disturbing examples from the ticket is this:

http://projects.puppetlabs.com/issues/5876#note-16

This one is not fixed through the proposed change. It will need another
change to the effect that a failure to refresh constitutes a resource
failure as well, I assume. Which it should. But, and I repeat myself on
purpose, this should be a different discussion.

Cheers,
Felix

John Bollinger

unread,

Sep 17, 2014, 3:21:24 PM9/17/14

to puppe...@googlegroups.com

On Wednesday, September 17, 2014 12:12:31 PM UTC-5, Andy Parker wrote:

Adrien has put together a change to address PUP-1106 (failed dependencies are not honored on refresh), however this is a really thorny issue and I'd like to get some more eyes on it before committing to the change.

I think the original bug in redmine has the most history (http://projects.puppetlabs.com/issues/5876) and unfortunately it doesn't look like there was a clear decision about what the correct behavior is. I think John's description of the semantics are pretty good, but I find it a little hard to unravel it into what needs to change.

Adrien's solution is to make it so that a resource *does not process refresh events* when it is being skipped. Skipping can happen because of failed dependencies or because the resource isn't scheduled. I'm going to keep looking over it and will try it out some, but I think others should kick these tires a bit as well.

Referring back to my comments on Redmine 5876, it sounds like that corresponds to ensuring that this expectation holds:

> (4) a resource that is not synchronized cannot be refreshed, regardless of what resource notifies it or to what other resource it may be subscribed.

I hadn't considered reasons for skipping synchronization other than failure of dependencies, but I absolutely agree that resources skipped for any reason should not refresh. But what about resources that themselves fail? Such resources also should not refresh. Does that already work, and/or is it addressed by Adrien's patch?

Further question: an Exec that is considered in sync because of its 'creates', 'onlyif', and/or 'unless' parameter should not also be considered skipped. Is it necessary or appropriate for such a resource to avoid refreshing, too? Moreover, is it necessary or appropriate for the provider to evaluate those parameters (again) at refresh time?

Additionally, there are some other behaviors around signaling and syncing that are discussed in the commentary on Redmine 5876, but that go a bit beyond the reported buggy behavior. These don't necessarily have to be addressed to close the issue, but if the commentary seems valuable then I hope it can be put somewhere less likely to be lost upon closure of the issue. In particular, I would really like to see this implemented:

> (6) Execs should not be construed as having the same synchronization and refresh actions. Instead, they should be construed as having only one or the other.

Perhaps Execs could retain a possibility to run twice (as they will do now if they receive a signal and are not refreshonly), though I have never come across an example where that behavior was invoked intentionally. If this is indeed implemented, then I think it would provide direction on the question I raised above about the 'creates', 'onlyif', and 'unless' parameters: that they should be processed once, to control the (at most one) execution of the Exec's command.

There is also the question of the appropriate effect of a resource successfully syncing but failing to refresh. Should that cause dependent resources to be skipped? If not, then should it maybe cause refreshes of dependent resources to be skipped (supposing they receive a signal from some other resource)?

Adrien Thebo

unread,

Sep 17, 2014, 3:50:15 PM9/17/14

to puppe...@googlegroups.com

On Wed, Sep 17, 2014 at 12:00 PM, Felix Frank <Felix...@alumni.tu-berlin.de> wrote:

[snip]

There are other exciting cases in there, such as "is a resource failed
if it's in sync but the refresh fails" and so forth. I feel that those
are more relevant, but should probably be handled on their own. This
should be possible, and I don't see that it would be contradictory for
the proposed design. In other words, fixes for related issues can likely
be implemented independently of merging this one.

Specifically, one of the more disturbing examples from the ticket is this:

http://projects.puppetlabs.com/issues/5876#note-16

This one is not fixed through the proposed change. It will need another
change to the effect that a failure to refresh constitutes a resource
failure as well, I assume. Which it should. But, and I repeat myself on
purpose, this should be a different discussion.

Good points. I think that this issue would be a great candidate to fix for Puppet 4.0, but I think we should extract that to a separate ticket and address that separately. (For the fun of it I checked and I think it's a one line change to https://github.com/puppetlabs/puppet/blob/3.7.0/lib/puppet/transaction.rb#L224 to check for `s.failed_to_restart?` )