Handling failures in prefetch

48 views
Skip to first unread message

Branan Riley

unread,
Jul 21, 2015, 3:40:19 PM7/21/15
to puppe...@googlegroups.com
<This is email 2/2 inspired by PUP-4813>

With puppet 4, we made a change to how prefetch failures are handled
by the transaction[1]. This is actually a pretty good change, as it
prevents puppet from some nasty misbehaviors.

I still don't think it's the "correct" behavior, though. Aborting the
entire catalog because one resource type is broken doesn't seem like
the best behavior, especially when it comes to reporting[2]

I feel like the correct behavior is something like this:

* If an exception escapes a prefetch, mark all resources of that type
  in the catalog as failed and do not try to apply them

* Make it possible to fail resources from *within* a prefetch. This
  lets a sufficiently smart prefetch handle the case when only some of
  its resources are hosed (parsedfile resources with multiple targets,
  for example, might be only partially unfetchable)

If there's agreement on this I'll file the appropriate tickets. If
there are other ideas of how we should handle prefetch failures I'd
also love to hear them.

Thanks,
Branan Riley
Puppet Labs Software Engineer

[1] PUP-3656
[2] See https://groups.google.com/forum/#!topic/puppet-dev/QMTJli2oabc

Trevor Vaughan

unread,
Jul 21, 2015, 4:05:39 PM7/21/15
to puppe...@googlegroups.com
I like all the things you have said here.

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CADWDnrmDGO%3DR__fc0YxB4ywcGAbcOnASMKBw%2BPdPusGoG_nbog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

Kylo Ginsberg

unread,
Jul 22, 2015, 3:44:36 PM7/22/15
to puppe...@googlegroups.com
On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <bra...@puppetlabs.com> wrote:
<This is email 2/2 inspired by PUP-4813>

With puppet 4, we made a change to how prefetch failures are handled
by the transaction[1]. This is actually a pretty good change, as it
prevents puppet from some nasty misbehaviors.

I still don't think it's the "correct" behavior, though. Aborting the
entire catalog because one resource type is broken doesn't seem like
the best behavior, especially when it comes to reporting[2]

I feel like the correct behavior is something like this:

* If an exception escapes a prefetch, mark all resources of that type
  in the catalog as failed and do not try to apply them

Background for a question: 

Wrt exceptions in prefetch, as of 4.0 there is a distinction between the expected/supported exceptions (there are two: Puppet::MissingCommand and LoadError) and everything else.

I'm not sure this is *super* well documented, but it is mentioned as a breaking change in the 4.0 release notes:


and the rescue in question is here:


So with that in mind, is the proposal here about the handling of the two supported exceptions or the handling of other exceptions?

For the two supported exceptions, this seems like a reasonable behavior change and could also speed up catalog application in this failure case.

For *other* exceptions, I'm not sure. I walked away from PUP-3656 thinking that those two exceptions were now part of the contract we're specifying for providers. If so, a provider that throws any *other* exception has a bug which should be fixed, and we could be making matters worse by trying to soldier on. I.e. this seems like a situation where it's better to fail hard and the user should file a bug, upgrade/downgrade the module, etc.

Btw (and perhaps I should have asked this from the get-go before opining above): what was the exception in PUP-4813 that precipitated this thread?


* Make it possible to fail resources from *within* a prefetch. This
  lets a sufficiently smart prefetch handle the case when only some of
  its resources are hosed (parsedfile resources with multiple targets,
  for example, might be only partially unfetchable)

How would this work? A new hook for providers to implement? If so, it'd be nice to flesh out a couple examples use cases so that we can think through the API with something concrete.


If there's agreement on this I'll file the appropriate tickets. If
there are other ideas of how we should handle prefetch failures I'd
also love to hear them.

Thanks for bringing these issues up! Sometimes innocuous looking tickets bloom into huge "how did this ever work?" conversations and this is kinda one of those :)

Kylo
 

Thanks,
Branan Riley
Puppet Labs Software Engineer

[1] PUP-3656
[2] See https://groups.google.com/forum/#!topic/puppet-dev/QMTJli2oabc

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CADWDnrmDGO%3DR__fc0YxB4ywcGAbcOnASMKBw%2BPdPusGoG_nbog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Kylo Ginsberg | ky...@puppetlabs.com | irc: kylo | twitter: @kylog

PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
Register now to take advantage of the Early Bird discount save $249!

Branan Riley

unread,
Jul 22, 2015, 4:37:45 PM7/22/15
to puppe...@googlegroups.com


On Wed, Jul 22, 2015 at 12:44 PM Kylo Ginsberg <ky...@puppetlabs.com> wrote:
On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <bra...@puppetlabs.com> wrote:
<This is email 2/2 inspired by PUP-4813>

With puppet 4, we made a change to how prefetch failures are handled
by the transaction[1]. This is actually a pretty good change, as it
prevents puppet from some nasty misbehaviors.

I still don't think it's the "correct" behavior, though. Aborting the
entire catalog because one resource type is broken doesn't seem like
the best behavior, especially when it comes to reporting[2]

I feel like the correct behavior is something like this:

* If an exception escapes a prefetch, mark all resources of that type
  in the catalog as failed and do not try to apply them

Background for a question: 

Wrt exceptions in prefetch, as of 4.0 there is a distinction between the expected/supported exceptions (there are two: Puppet::MissingCommand and LoadError) and everything else.

I'm not sure this is *super* well documented, but it is mentioned as a breaking change in the 4.0 release notes:


and the rescue in question is here:


So with that in mind, is the proposal here about the handling of the two supported exceptions or the handling of other exceptions?

For the two supported exceptions, this seems like a reasonable behavior change and could also speed up catalog application in this failure case.

For *other* exceptions, I'm not sure. I walked away from PUP-3656 thinking that those two exceptions were now part of the contract we're specifying for providers. If so, a provider that throws any *other* exception has a bug which should be fixed, and we could be making matters worse by trying to soldier on. I.e. this seems like a situation where it's better to fail hard and the user should file a bug, upgrade/downgrade the module, etc.

I would say this should be for ALL exceptions. We absolutely should stop puppet from soldiering on in ways that could cause problems, but I don't think trying to apply other resource types in the catalog is one of those ways. Prefetch failure should be a localized catastrophe, not a global one.

The move away from rescuing everything was definitely good, but aborting the entire transaction when only a single type is unprefetchable is a bit of an over-reaction. We know that instances of that type (really just ones of that type/provider pair) are bad, and we absolutely shouldn't apply them. The rest of the catalog is still fine, though.

The two exceptions that are supported allow a Puppet run to install any packages/gems that a provider might need in order to prefetch. Skipping only the resources that would have been prefetched instead of the entire catalog serves that same purpose.

Aborting the entire catalog also makes remediation with Puppet (when puppet could be used for such) impossible, which seems super bad to me.
 
Btw (and perhaps I should have asked this from the get-go before opining above): what was the exception in PUP-4813 that precipitated this thread?
 
The bad catalog from that ticket set the `target` field of mount to a directory, which causes some sort of IO exception to be raised when ParsedFile tries to access that directory as a regular file.

This is what inspired my bullet point below - ParsedFile should be smart enough to know which resources have a bad target and which are OK, and appropriately fail only the ones it can't work with.
 

* Make it possible to fail resources from *within* a prefetch. This
  lets a sufficiently smart prefetch handle the case when only some of
  its resources are hosed (parsedfile resources with multiple targets,
  for example, might be only partially unfetchable)

How would this work? A new hook for providers to implement? If so, it'd be nice to flesh out a couple examples use cases so that we can think through the API with something concrete.

It's probably already "doable" by having prefetch mark the returned provider instances as bad in some way, and the provider implementation for flush could raise an exception. I'd prefer to see something at the transaction layer, though. I'll think through what this might look like more.
 


If there's agreement on this I'll file the appropriate tickets. If
there are other ideas of how we should handle prefetch failures I'd
also love to hear them.

Thanks for bringing these issues up! Sometimes innocuous looking tickets bloom into huge "how did this ever work?" conversations and this is kinda one of those :)

Kylo
 

Thanks,
Branan Riley
Puppet Labs Software Engineer

[1] PUP-3656
[2] See https://groups.google.com/forum/#!topic/puppet-dev/QMTJli2oabc

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CADWDnrmDGO%3DR__fc0YxB4ywcGAbcOnASMKBw%2BPdPusGoG_nbog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Kylo Ginsberg | ky...@puppetlabs.com | irc: kylo | twitter: @kylog

PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
Register now to take advantage of the Early Bird discount save $249!

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

Felix Frank

unread,
Jul 22, 2015, 5:14:00 PM7/22/15
to puppe...@googlegroups.com
Seconded!

Cheers,
Felix


On 07/21/2015 10:05 PM, Trevor Vaughan wrote:

Kylo Ginsberg

unread,
Jul 22, 2015, 10:44:47 PM7/22/15
to puppe...@googlegroups.com
I can see this argument. I'm still concerned that this would mean we aren't surfacing bugs that really should be exposed to the light of day. I suppose we could report the one case with a Warning and the other with an Error.

But assuming we go ahead and do something like this (I see I'm the only one with the slightest reservation :)), I'm thinking we should audit how puppet handles exceptions in providers during catalog application but *outside* prefetch. It would be nice to be consistent throughout catalog application.


Aborting the entire catalog also makes remediation with Puppet (when puppet could be used for such) impossible, which seems super bad to me.

Is it impossible? Couldn't you remediate with a catalog that doesn't include resources that will tickle prefetch from that provider? Not trying to quibble (i.e. yes, that could still be a PITA), but making sure I'm not missing something.
 
 
Btw (and perhaps I should have asked this from the get-go before opining above): what was the exception in PUP-4813 that precipitated this thread?
 
The bad catalog from that ticket set the `target` field of mount to a directory, which causes some sort of IO exception to be raised when ParsedFile tries to access that directory as a regular file.

Ooh fun. So we should fix that to be more robust. Separate ticket than this thread though.
 

This is what inspired my bullet point below - ParsedFile should be smart enough to know which resources have a bad target and which are OK, and appropriately fail only the ones it can't work with.
 

* Make it possible to fail resources from *within* a prefetch. This
  lets a sufficiently smart prefetch handle the case when only some of
  its resources are hosed (parsedfile resources with multiple targets,
  for example, might be only partially unfetchable)

How would this work? A new hook for providers to implement? If so, it'd be nice to flesh out a couple examples use cases so that we can think through the API with something concrete.

It's probably already "doable" by having prefetch mark the returned provider instances as bad in some way, and the provider implementation for flush could raise an exception. I'd prefer to see something at the transaction layer, though. I'll think through what this might look like more.

Well, definitely some charm to not adding new methods that we have to sprinkle respond_to? tests around for. So the former approach might just fly.

I'd also like to think through whether there are additional motivating use cases.

Kylo

-- 

Gareth Rushgrove

unread,
Jul 23, 2015, 8:38:00 AM7/23/15
to puppe...@googlegroups.com
For some relevant colour we hit this problem hard with the AWS module.
We're prefetching a lot of complex data in some cases and we're going
it over the wire - so it going wrong is pretty likely.

We worked around that at the time by creating our own Exception
inherited from Exception which does get caught by Puppet:

https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet_x/puppetlabs/aws.rb#L7-L23

And then raising that in cases of StandardError based exceptions being
raised in prefetch.

https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet/provider/ec2_instance/v2.rb#L34-L36

Some more details in this issue too about the changes in 4.0.

https://jira.puppetlabs.com/browse/PUP-3656

Gareth

> Kylo
>
> --
> Kylo Ginsberg | ky...@puppetlabs.com | irc: kylo | twitter: @kylog
>
> PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
> Register now to take advantage of the Early Bird discount —save $249!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CALsUZFHyPv0mGjeE5Xhn-sv0WYXJ2QbDzY802sBe2JJQR%3DLT5g%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Gareth Rushgrove
@garethr

devopsweekly.com
morethanseven.net
garethrushgrove.com

Trevor Vaughan

unread,
Jul 23, 2015, 9:09:16 AM7/23/15
to puppe...@googlegroups.com
Just out of curiosity, how difficult would it be to defer the resource chain until the end of the catalog if a prefetch fails?

It may not work in all cases, but it's certainly feasable that the prefetch is failing because of some other item in the catalog that hasn't been applied and, given the complexity of the system, the amount of effort that it would involve to proper chain all resources just isn't worth the cost of doing two runs to fix the issue. I'm particularly thinking of things like package repositories here where you really don't want to artificially bind all packages to your repositories for no good reason and/or you may be pulling data from a different source that providers are going to rely on (RH Satellite, Pulp, something else?).

Since it's a prefetch, and nothing has yet changed on the system, could that execution chain just be deferred until just before the post-execution section of the catalog? If this were possible, the impact of a catalog failure would, in theory, be minimized with everything that could be applied happening before those items that could not.

Even then, I think that all resource failures should be localized to the resource, and resource chain, to which they apply. The *best* thing about Puppet is the fact that it does as much as it can on every run. That really shouldn't be broken in my opinion. My classic example is "Puppet applies my security settings across the system, regardless of Apache failing to install or configure". In this case, Puppet should apply my security settings regardless of some random resource having a prefetch failure.

Circling this all the way around to my favorite pet bug that I never have time to work on PUP-4002, in the case where prefetch fails, but the provider (or a dependent of the provider) has a way to handle this later, recording it for all instance of the resource to query later would be a nice to have. I'm thinking of the 'cron' type when I write this since the prefetch phase of cron appears to fail if the crontab file for a user does not exist. But, it doesn't matter in the least because the provider merrily carries on and does what it is supposed to do after raising an alarming, but harmless, error message. (Note: I haven't checked the 4.X behavior on this yet).

Thanks,

Trevor


For more options, visit https://groups.google.com/d/optout.



--

Nan Liu

unread,
Jul 23, 2015, 1:43:17 PM7/23/15
to puppet-dev
On Thu, Jul 23, 2015 at 6:09 AM, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
Just out of curiosity, how difficult would it be to defer the resource chain until the end of the catalog if a prefetch fails?

Rather then retrying I would be happy if prefetch can specify a specific resource dependency. It will resolve some of the issues I've seen where I have to split puppet into multiple catalogs or multiple runs.

Reply all
Reply to author
Forward
0 new messages