Draft for new type and provider API

104 views
Skip to first unread message

David Schmitt

unread,
Jan 31, 2017, 11:04:27 AM1/31/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi *,

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2). Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.

To showcase my vision, this [gist](https://gist.github.com/DavidS/430330ae43ba4b51fe34bd27ddbe4bc7) has the [apt_key type](https://github.com/puppetlabs/puppetlabs-apt/blob/master/lib/puppet/type/apt_key.rb) and [provider](https://github.com/puppetlabs/puppetlabs-apt/blob/master/lib/puppet/provider/apt_key/apt_key.rb) ported over to my proposal. The second example there is a more long-term teaser on what would become possible with such an API.

The new API, like the existing, has two parts: the implementation that interacts with the actual resources, a.k.a. the provider, and information about what the implementation is all about. Due to the different usage patterns of the two parts, they need to be passed to puppet in two different calls:

The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 

The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.
 

Details in no particular order:

* All of this should fit on any unmodified puppet4 installation. It is completely additive and optional. Currently.

* The Type definition
  * It is data-only.
  * Refers to puppet data types.
  * No code runs on the server.
  * This information can be re-used in all tooling around displaying/working with types (e.g. puppet-strings, console, ENC, etc.).
  * autorelations are restricted to unmodified attribute values and constant values.
  * No more `validate` or `munge`! For the edge cases not covered by data types, runtime checking can happen in the implementation on the agent. There it can use local system state (e.g. different mysql versions have different max table length constraints), and it will only fail the part of the resource tree, that is dependent on this error. There is already ample precedent for runtime validation, as most remote resources do not try to replicate the validation their target is already doing anyways.
  * It maps 1:1 to the capabilities of PCore, and is similar to the libral interface description (see [libral#1](https://github.com/puppetlabs/libral/pull/2)). This ensures future interoperability between the different parts of the ecosystem.
  * Related types can share common attributes by sharing/merging the attribute hashes.
  * `defaults`, `read_only`, and similar data about attributes in the definition are mostly aesthetic at the current point in time, but will make for better documentation, and allow more intelligence built on top of this later.

* The implementation are two simple functions `current_state = get()`, and `set(current_state, target_state, noop)`.
  * `get` on its own is already useful for many things, like puppet resource.
  * `set` receives the current state from `get`. While this is necessary for proper operation, there is a certain race condition there, if the system state changes between the calls. This is no different than what current implementations face, and they are well-equipped to deal with this.
  * `set` is called with a list of resources, and can do batching if it is beneficial. This is not yet supported by the agent.
  * the `current_state` and `target_state` values are lists of simple data structures built up of primitives like strings, numbers, hashes and arrays. They match the schema defined in the type.
  * Calling `r.set(r.get, r.get)` would ensure the current state. This should run without any changes, proving the idempotency of the implementation.
  * The ruby instance hosting the `get` and `set` functions is only alive for the duration of an agent transaction. An implementation can provide a `initialize` method to read credentials from the system, and setup other things as required. The single instance is used for all instances of the resource.
  * There is no direct dependency on puppet core libraries in the implementation.
    * While implementations can use utility functions, they are completely optional.
    * The dependencies on the `logger`, `commands`, and similar utilities can be supplied by a small utility library (TBD).

* Having a well-defined small API makes remoting, stacking, proxying, batching, interactive use, and other shenanigans possible, which will make for a interesting time ahead.

* The logging of updates to the transaction is only a sketch. See the usage of `logger` throughout the example. I've tried different styles for fit.
  * the `logger` is the primary way of reporting back information to the log, and the report.
  * results can be streamed for immediate feedback
  * block-based constructs allow detailed logging with little code ("Started X", "X: Doing Something", "X: Success|Failure", with one or two calls, and only one reference to X)
 
* Obviously this is not sufficient to cover everything existing types and providers are able to do. For the first iteration we are choosing simplicity over functionality.
  * Generating more resource instances for the catalog during compilation (e.g. file#recurse or concat) becomes impossible with a pure data-driven Type. There is still space in the API to add server-side code.
  * Some resources (e.g. file, ssh_authorized_keys, concat) cannot or should not be prefetched. While it might not be convenient, a provider could always return nothing on the `get()` and do a more customized enforce motion in the `set()`.
  * With current puppet versions, only "native" data types will be supported, as type aliases do not get pluginsynced. Yet.
  * With current puppet versions, `puppet resource` can't load the data types, and therefore will not be able to take full advantage of this. Yet.

* There is some "convenient" infrastructure (e.g. parsedfile) that needs porting over to this model.

* Testing becomes possible on a completely new level. The test library can know how data is transformed outside the API, and - using the shape of the type - start generating test cases, and checking the actions of the implementation. This will require developer help to isolate the implementation from real systems, but it should go a long way towards reducing the tedium in writing tests.


What do you think about this?


Cheers, David


Trevor Vaughan

unread,
Feb 1, 2017, 9:51:02 PM2/1/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

Most of this looks fine (I still need to digest some of it) but I don't know that I'm a huge fan of no munge or validate functionality. This seems to fly in the face of a 'fail fast' mentality.

Also, if everything happens on the client, some of the catalog munging that I do on the fly isn't going to work any longer.

I *do* like the idea of types being pure data but will this bloat the catalog even further?

I still think that moving toward a heap/stack model would serve the Puppet clients best and get past the resource count limitations.

Fundamentally, I want my clients to do as little work as possible since they're meant to be running business processes.

I'll definitely be keeping an eye on this conversation though.

Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHaJdvPrkqRQEMqEgLSUvOy-O4DuL-iNsLrPt74HY7djvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc

-- This account not approved for unencrypted proprietary information --

David Schmitt

unread,
Feb 2, 2017, 8:42:22 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 2 February 2017 at 02:50, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
Hi David,

Most of this looks fine (I still need to digest some of it) but I don't know that I'm a huge fan of no munge or validate functionality. This seems to fly in the face of a 'fail fast' mentality.

With puppet 4 data types much of the trivial stuff munge and validate do can be handled much easier and consistently than before. Many other checks will never be possible on the server, because they depend on values only available on the agent (credential validity, max tablename length dependent on mysql flavour), or would be error prone duplication of things the target API does check for us (validity of file mode bit combinations, validity of AMI references when talking to EC2 instances, etc). There is a bit of stuff in-between those two (e.g. content&source specified on a file), which we'd trade-off for a much simpler server side. Failing on the agent, also means that independent parts of the catalog can move forward, instead of failing the whole compilation.
 
Also, if everything happens on the client, some of the catalog munging that I do on the fly isn't going to work any longer.

Those resources will not look anything different than existing types (which will stay as they are for a considerable while). Since I don't know what catalog munging you're doing I'm having a hard time following here.

I *do* like the idea of types being pure data but will this bloat the catalog even further?

The resources in the catalog won't look any different than today. In fact, the first version will most likely just call Puppet::Type.newtype() under the hood somewhere.
 
I still think that moving toward a heap/stack model would serve the Puppet clients best and get past the resource count limitations.

Do you have those ideas written up somewhere?
 
Fundamentally, I want my clients to do as little work as possible since they're meant to be running business processes.

Same.
 

I'll definitely be keeping an eye on this conversation though.

Thanks,

Thanks for your time and work you're putting in here.


David
 

Trevor Vaughan

unread,
Feb 2, 2017, 10:38:40 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
I think I'm slowly talking myself into this.

The issue that I've had is that the current Data Types are composable. There's a ticket open about it and hopefully that will fix a lot of my issues.

Unfortunately, it looks like my heap/stack discussion was with Henrik on Slack, so that's not super helpful.

Basically, the idea is that, instead of using a pure object model, objects would be created *as necessary* and the linear execution model would be kept in a simple array (stack). The data space (heap) would contain the attributes that are required for each object to execute.

This means that the data structure would be extremely efficient and there would be almost no cost to building and manipulating an object graph.

The benefit of the object graph is to keep the object model. However, the object model is simply not necessary and has proven not to scale to 10k+ resources. The Heap/Stack model should scale based on the amount of data that you have and would be *extremely* portable to other languages.

The biggest down side (for me) is that I manipulate the catalog graph at run time for various reasons. There would probably be an easy way to work this into the API though since it would just be an array insertion.

In terms of the validate/munge question, munging can *absolutely* be done client side. But do you even want to pass a catalog to the client if you have invalid data that might be used by other resources. Also, I do quite a bit of checking based on the values of other parameters/properties. If this could be maintained reasonably, that would be fine.

I do think that having a global data space would be extremely important to making this feasible. There are a couple of tickets on for this as well but I don't know the numbers off hand. And, of course, if we use the heap/stack model, the data is all in place (but may change based on resource processing).

Hopefully this all makes sense.

Thanks,

Trevor


For more options, visit https://groups.google.com/d/optout.

David Schmitt

unread,
Feb 2, 2017, 11:20:12 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 2 February 2017 at 15:38, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
I think I'm slowly talking myself into this.

The issue that I've had is that the current Data Types are composable. There's a ticket open about it and hopefully that will fix a lot of my issues.

Assuming you're talking about PUP-7033, there is nothing much I can help you with there.


Unfortunately, it looks like my heap/stack discussion was with Henrik on Slack, so that's not super helpful.

Basically, the idea is that, instead of using a pure object model, objects would be created *as necessary* and the linear execution model would be kept in a simple array (stack). The data space (heap) would contain the attributes that are required for each object to execute.

This means that the data structure would be extremely efficient and there would be almost no cost to building and manipulating an object graph.

The benefit of the object graph is to keep the object model. However, the object model is simply not necessary and has proven not to scale to 10k+ resources. The Heap/Stack model should scale based on the amount of data that you have and would be *extremely* portable to other languages.

The agent requires the whole graph for its operation. Since the v1 of this will only build on top of the existing APIs, I expect the (memory) situation to get worse before it'll get better. The new API itself doesn't need a huge object model for its operation. The resources are passed around as simple arrays and hashes. That could be much more compact than the current glut of Puppet::Provider instances.
 
The biggest down side (for me) is that I manipulate the catalog graph at run time for various reasons. There would probably be an easy way to work this into the API though since it would just be an array insertion.

Yeah, things like file#recurse, or what concat_file does with `generate` would not be possible with this new API. That is a deliberate decision to skip a complex fringe feature. I'd love to see what you're doing with it, to get a better handle of how to re-introduce this in a more pleasant form. Within the next few years, you'll still be able to just use the puppet 2 T&P API as-is, if you need to.
 
In terms of the validate/munge question, munging can *absolutely* be done client side. But do you even want to pass a catalog to the client if you have invalid data that might be used by other resources. Also, I do quite a bit of checking based on the values of other parameters/properties. If this could be maintained reasonably, that would be fine.

The hard and generic case of error handling is dynamic errors during application of the catalog. Puppet will always have to deal with those gracefully. Are there some validations that could be done earlier than during the update? Absolutely. How much does it buy us, though? If you have pointers to cool things you're currently doing, I'd be very interested in seeing those. No worries about timing. Due to travels, this proposal won't go anywhere in the next two weeks. And I'd rather have the full picture anyways.
 
I do think that having a global data space would be extremely important to making this feasible. There are a couple of tickets on for this as well but I don't know the numbers off hand. And, of course, if we use the heap/stack model, the data is all in place (but may change based on resource processing).

Having a proper place to do global transformations on the catalog both agent and server side would be beneficial to many advanced use-cases. As I said above, the current way to do so will not go away anytime soon. This Resource API just won't make it any less painful.
 

Hopefully this all makes sense.

Very much so. I hope the same.


Cheers, David

Andreas Zuber

unread,
Feb 2, 2017, 11:28:51 AM2/2/17
to puppet...@googlegroups.com
Serious question:

> Puppet::SimpleResource.implement('apt_key')

What is the benefit of a construct like this compared to a regular ruby
class where it will be completely obvious to every ruby programmer how
inheritance works and what actually happens if I use a mixin?

Martin Alfke

unread,
Feb 3, 2017, 6:12:30 AM2/3/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

many thanks for sharing your ideas.

As you already know: personally I don’t say that types and providers are way to difficult.
It is more a matter on how to explain them.
I will dig through your document during weekend. We can have a chat on Monday in Ghent.

But besides this:
is your idea somehow related to the libral project?

Best,
Martin


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

David Schmitt

unread,
Feb 3, 2017, 7:23:55 AM2/3/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 3 February 2017 at 11:12, Martin Alfke <tux...@gmail.com> wrote:
Hi David,

many thanks for sharing your ideas.

As you already know: personally I don’t say that types and providers are way to difficult.
It is more a matter on how to explain them.
I will dig through your document during weekend. We can have a chat on Monday in Ghent.

But besides this:
is your idea somehow related to the libral project?


 
  * It maps 1:1 to the capabilities of PCore, and is similar to the libral interface description (see [libral#1](https://github.com/puppetlabs/libral/pull/2)). This ensures future interoperability between the different parts of the ecosystem.

libral is a neighbouring project, with its own set of distinct design goals. Staying bidirectionally compatible is a necessary requisite for this proposal here. (Which can be handled on both sides, depending on a wide variety of things).

Cheers, David

Trevor Vaughan

unread,
Feb 3, 2017, 8:01:17 PM2/3/17
to puppe...@googlegroups.com, puppet...@googlegroups.com
I'm still thinking about a lot of this but one thing that I will ask is that you don't drop the actual compiled catalog (in whatever form).

One of the reasons that Puppet is good for compliance focused architectures is based on the fact that I can easily hook it to validate input on the fly


Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHb9b7HknTGrhMbu5213kpMX3rHg-nwiOKr2jvcqnzTh-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Martin Alfke

unread,
Feb 5, 2017, 8:35:27 AM2/5/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

I found some time digging into your proposal.
It seems as if we are following completely different approaches. I hope that my email will not upset or offend you.

Usually I hear lot of people complaining, that types and providers are the most complex thing todo with Puppet.
Especially since the DSL is easy to read and understand.
When I thought about “making types and providers more easy to use” I always think about the simplicity of Puppet DSL and never thought about a new ruby API.

I am happy to chat with you tomorrow. But I don’t see how Types and Providers will be more easy using your proposed API.
It seems to me as if you are “hiding” some things which are hard to understand - similar to mk_resource_methods which is useful when following some silent assumptions.
I always prefer Puppet to be as simple as possible which means: do everything as obvious as possible.

Different to your proposal I thought whether it would be a possible way to reuse Puppet DSL for types and providers.
Maybe it is a silly idea. I am interested in your opinion.

Think of something like the following (yes, I know, “type” is already a reserved word):

type my_module::my_file (
Enum['file', 'absent' ] $ensure,
Stdlib::Absolute_path $path = $title,
String $owner,
){
def create {
exec { "create file ${path}":
command => "touch ${path}",
path => '/usr/bin:/bin',
creates => $path,
}
}
def destroy {
file { $path:
ensure => absent,
}
}
property owner {
exec { "chown ${owner} on ${path}":
command => "chown ${owner} ${path}",
path => '/bin:/usr/bin',
unless => '',
}
}
}

(I know that exec should not be used in this way.)


Always think on what users want to do:

People with ruby experience will not find types and providers difficult (in the way they are done today).

But Puppet was originally written for Ops people who don’t like scripting or coding.
They were happy having a readable, easy to understand DSL instead.
These are the ones which you want to make types/providers more easy.

Looking forward to seeing you tomorrow or later this evening.

Best,
Martin

Corey Osman

unread,
Feb 5, 2017, 3:53:10 PM2/5/17
to Puppet Developers, voxp...@groups.io, puppet...@googlegroups.com
One improvement I would like to see with the provider API is around getting the current state.  I see many beginners including myself trying to add way more complexity than required in the exists? method.

One would think that you in order to figure out if the resource exists that you should just query the system and DIY.  And I'll pick on my own code as an example: https://github.com/logicminds/bmclib/blob/0.1.0/lib/puppet/provider/bmc/ipmitool.rb#L34

However, this is not the case.  This is the exact opposite of what you need to do. Instead you need to query the API which is hard to understand and not obvious at first because instinct tells you to get the state yourself.


Another improvement which I hope would get addressed is self.prefetch.  While self.instances is pretty simple to understand the prefetch method could probably use a better name and an easier way to implement. It was not obvious at first how to implement. So many people don't implement these two methods when they should making their provider more useful. 


The type checking looks pretty amazing and simple to understand and implement in the type code.

Additionally, As Trevor mentioned I also have some projects (puppet-debugger and puppet-retrospec) that utilize the current Puppet API to introspec resources and query the catalog please so make sure this new code doesn't break my projects.   I'll be happy to test out any new API for third party compatibility. 


Corey  (NWOps)

David Lutterkort

unread,
Feb 6, 2017, 5:51:47 AM2/6/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi Martin,

On Sun, Feb 5, 2017 at 2:35 PM, Martin Alfke <tux...@gmail.com> wrote:
 
It seems to me as if you are “hiding” some things which are hard to understand - similar to mk_resource_methods which is useful when following some silent assumptions.

My big hope is that we can do away with all that internal machinery and replace it with a very simple interface between types and providers (roughly: look up existing resources and change resources, what David has as set and get in his proposal)
 
Think of something like the following (yes, I know, “type” is already a reserved word):

type my_module::my_file (
  Enum['file', 'absent' ] $ensure,
  Stdlib::Absolute_path   $path = $title,
  String                  $owner,
){
  def create {
    exec { "create file ${path}":
      command => "touch ${path}",
      path    => '/usr/bin:/bin',
      creates => $path,
    }
  }
  def destroy {
    file { $path:
      ensure =>  absent,
    }
  }
  property owner {
    exec { "chown ${owner} on ${path}":
      command => "chown ${owner} ${path}",
      path    => '/bin:/usr/bin',
      unless  => '',
    }
  }
}

(I know that exec should not be used in this way.)

I think it's totally reasonable for people to write Puppet code that acts like a type/provider; but that's possible today as you could turn the above code into a defined type.

To me, the point of the type/provider mechanism is to allow extending what Puppet can manage beyond what's easily accessible via Puppet. I think that simplification here means that the provider is exposed to a larger-scale change, i.e. is told to enforce a particular resource. Looking, for example, at this systemd provider or this dnf provider  makes me think that breaking change into small pieces (per property) actually makes it harder to write providers, not easier. There's clearly room for adding some language-specific conveniences, but even without that I think the code is pretty straightforward.

People with ruby experience will not find types and providers difficult (in the way they are done today).

But Puppet was originally written for Ops people who don’t like scripting or coding.

Yes, this is a really thorny one - my hunch is that writing types and providers will always be an advanced topic, and it's therefore reasonable to assume some familiarity with scripting.

David

David Schmitt

unread,
Feb 7, 2017, 5:31:22 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi Corey,

thanks for looking into this. I've been travelling so accept my apologies for a late answer.


On 5 February 2017 at 20:52, Corey Osman <co...@logicminds.biz> wrote:
One improvement I would like to see with the provider API is around getting the current state.  I see many beginners including myself trying to add way more complexity than required in the exists? method.

One would think that you in order to figure out if the resource exists that you should just query the system and DIY.  And I'll pick on my own code as an example: https://github.com/logicminds/bmclib/blob/0.1.0/lib/puppet/provider/bmc/ipmitool.rb#L34

However, this is not the case.  This is the exact opposite of what you need to do. Instead you need to query the API which is hard to understand and not obvious at first because instinct tells you to get the state yourself.


Another improvement which I hope would get addressed is self.prefetch.  While self.instances is pretty simple to understand the prefetch method could probably use a better name and an easier way to implement. It was not obvious at first how to implement. So many people don't implement these two methods when they should making their provider more useful.

Exactly what this should address. The implementation of `get` is very straightforward (and, currently, required).

The type checking looks pretty amazing and simple to understand and implement in the type code.

Thanks! I hope that a provider author would never have to "implement" this. The SimpleResource will create a proper Type class underneath with all the validation and munging generated.

 

Additionally, As Trevor mentioned I also have some projects (puppet-debugger and puppet-retrospec) that utilize the current Puppet API to introspec resources and query the catalog please so make sure this new code doesn't break my projects.   I'll be happy to test out any new API for third party compatibility. 

Thanks for the offer! For the time being, the implementation will sit completely on top of the unmodified puppet code to work on all puppet 4 versions out there. I also don't expect existing types to switch immediately, but I do hope that the new API makes implementations so much easier that people will switch quickly.

Then it will be a matter of filling in the gaps for the special cases.

Cheers, David S

 

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/95e3ecb5-e796-442f-ae73-7c40b67c6bc7%40googlegroups.com.

Thomas Hallgren

unread,
Feb 7, 2017, 5:44:50 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Like Martin Afke above, I'm very interested in what could be done using the Puppet Language alone. What use cases can be covered entirely that way? What is not feasible to ever do with PL? Would an API that made it possible to write types and providers PL and hand over tiny tasks to another language (Ruby, C, Go, etc.) be of interest? I.e. a clean cut boundary between the things possible (appropriate) to implement with PL and the things that aren't?

Perhaps this is a first step in that direction?

- thomas


David Schmitt

unread,
Feb 7, 2017, 6:33:59 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 7 February 2017 at 10:44, Thomas Hallgren <thomas....@puppet.com> wrote:
Like Martin Afke above, I'm very interested in what could be done using the Puppet Language alone. What use cases can be covered entirely that way?

As David L pointed out, many of these things can be done today in defines, or classes, already.

What is not feasible to ever do with PL?

Talk to external APIs.
 
Would an API that made it possible to write types and providers PL and hand over tiny tasks to another language (Ruby, C, Go, etc.) be of interest? I.e. a clean cut boundary between the things possible (appropriate) to implement with PL and the things that aren't?

I do not understand how extending the puppet dsl on the server helps with resource management on the agent.

 
Perhaps this is a first step in that direction?

In one possible reading, types are already are a extension point of the DSL, and this proposal makes that less painful, and easier to reason about.

Cheers, David

PS: If anyone is at ConfigurationManagementCamp in Gent these days, hit me up to talk about this!

 

Thomas Hallgren

unread,
Feb 7, 2017, 7:30:22 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On Tue, Feb 7, 2017 at 12:33 PM, David Schmitt <david....@puppet.com> wrote:

I do not understand how extending the puppet dsl on the server helps with resource management on the agent.

A precursor to this, would of course be able to use the parser on the agent and also have an evaluator tailored for the agent. We are already doing this with Pcore types (parse+eval on the agent) which means that types and providers can use them. This could be viewed as a continuation of that work.

 

 
Perhaps this is a first step in that direction?

In one possible reading, types are already are a extension point of the DSL, and this proposal makes that less painful, and easier to reason about.

yes, but it is still a) locked down on Ruby and b) puts the demand on the user to learn an additional language.

- thomas

 

jcbollinger

unread,
Feb 7, 2017, 12:13:43 PM2/7/17
to Puppet Users, puppe...@googlegroups.com, voxp...@groups.io


On Tuesday, January 31, 2017 at 10:04:27 AM UTC-6, David Schmitt wrote:

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2).


Honestly, I've never thought the type / provider system was too bad.  Certainly it works pretty well for simple things.  Now, I'm more of a Dev who does Ops than an Op who Devs, but from my perspective, the biggest problem with Puppet's types & providers has always been inadequate documentation.  The docs in this area are a bit better now than when I started with Puppet around eight years ago, but they have always missed presenting many relevant details about how the host Puppet instance, resource instances, and their associated provider instances interact.  This new effort doesn't seem to be targeting any of that.

 
Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.



I'm confused and dubious: how is it that the guiding goals for this effort are all about form rather than function?  Surely, if you intend for the result to serve any particular function(s) then you need to be guided at least in part by which functions those are.  Perhaps for you that's tied up in what you mean by "a type and provider API", but I am confident that some of the details of what that means to you differ from what it means to others.  You really need to be explicit with this if you want good feedback.  Moreover, inasmuch as you seem to imagine this API eventually supplanting the existing one, it may save a lot of future grief and expense to make sure everyone is on the same page early on.

Moreover, none of the goals given are addressed the needs of the API client, which is Puppet itself.  Or do you have a different concept of who the client is?  If so, then you and I have even more different ideas of what a "type and provider API" is than I had supposed.

Furthermore, as wonderful as "a smooth and simple ruby developer experience for both scripters and coders" sounds, I think it's utterly unrealistic.  People who aren't good at writing native code or who just want to avoid it have much different wants and expectations from those who are comfortable with writing native code.  In fact, Puppet already has support for "scripters": this is what defined types are all about.  People who don't want to be writing Ruby are unlikely ever to have a smooth or simple Ruby developer experience.  The most likely outcome of trying to simplify any more than serves coders well is producing something  insufficiently powerful / flexible.

 
The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 



So there is no mechanism for obtaining the current state of an individual resource?  That seems a bit of an oversight.  There are resource types that cannot feasibly be fully enumerated (e.g. files), and even resource types that cannot be enumerated at all.

Also, though you say elsewhere that the provider's set() method receives a list of resources to update, that is not borne out by your example code.

 
The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.



I'm not at all convinced that your concept of a resource type definition indeed covers everything needed -- or at least wanted -- on the server side.  For example, I'm all for engaging the type system, but
  1. The type system does not cover all potential validation needs.  In particular, it cannot perform joint validation on multiple parameters.
  2. Using the type system for any but pretty simple validity checks will run against your objective of providing a simple interface.
Also, if handling types does not involve any server-side code, does that mean that the type system is not engaged to validate parameters until the data reach the client?  If so, that in itself is a controversial interpretation of what is and is not needed on the server side.

I'm not prepared to offer feedback on the details at this time, because I don't think you've sufficiently established what the effort is trying to accomplish.


John

David Schmitt

unread,
Feb 8, 2017, 9:20:30 AM2/8/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
On 7 February 2017 at 17:13, jcbollinger <John.Bo...@stjude.org> wrote:


On Tuesday, January 31, 2017 at 10:04:27 AM UTC-6, David Schmitt wrote:

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2).


Honestly, I've never thought the type / provider system was too bad.  Certainly it works pretty well for simple things.  Now, I'm more of a Dev who does Ops than an Op who Devs, but from my perspective, the biggest problem with Puppet's types & providers has always been inadequate documentation.  The docs in this area are a bit better now than when I started with Puppet around eight years ago, but they have always missed presenting many relevant details about how the host Puppet instance, resource instances, and their associated provider instances interact.

What I read here is that the current state is complex enough that a couple of books and a crack team of tech writers couldn't explain it well.
 
  This new effort doesn't seem to be targeting any of that.

Looping back to the "lack of documentation", I see this lack grounded partially in the huge surface area that needs to be covered, since the existing T&P API is deeply entangled into the catalog compilation, and the agent transactions.

This new API has two entry points that take a small number of easily explained arguments, making a full documentation realistic. From this (and other's) reaction I do understand now, that I should have started with a more README-driven approach, to highlight the difference more. Let's try this again:

The SimpleResource (working title) consists of three distinct, but cooperating parts. The Definition describes the shape of a resource, its attributes, their types, which is the namevar, and the doc strings. The get method returns a list of hashes, each matching the shape of the resource as defined in the Definition. For example, a list of apt_key resources (here rendered in JSON) would look like this:

[
  {
    "name": "BBCB188AD7B3228BCF05BD554C0BE21B5FF054BD",
    "ensure": "present",
    "fingerprint": "BBCB188AD7B3228BCF05BD554C0BE21B5FF054BD",
    "long": "4C0BE21B5FF054BD",
    "short": "5FF054BD",
    "size": "2048",
    "type": "rsa",
    "created": "2013-06-07 23:55:31 +0100",
    "expiry": null,
    "expired": false
  },
  {
    "name": "B71ACDE6B52658D12C3106F44AB781597254279C",
    "ensure": "present",
    "fingerprint": "B71ACDE6B52658D12C3106F44AB781597254279C",
    "long": "4AB781597254279C",
    "short": "7254279C",
    "size": "1024",
    "type": "dsa",
    "created": "2007-03-08 20:17:10 +0000",
    "expiry": null,
    "expired": false
  },
  ...


The set method takes a list of target goals in the same format and is responsible for enforcing those goals on the target system. When calling the set method, the caller may provide the current state (as received from a recent get call) to allow the set method to shortcut reading the existing state again. The basic pattern for the set method looks like this:

def set(current_state, target_state, noop = false)
existing_keys = Hash[current_state.collect { |k| [k[:name], k] }]
target_state.each do |resource|
# additional validation for this resource goes here

current = existing_keys[resource[:name]]
if current && resource[:ensure].to_s == 'absent'
# delete the resource
elsif current && resource[:ensure].to_s == 'present'
# update the resource
elsif !current && resource[:ensure].to_s == 'present'
# create the resource
end
end
end


 
Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.



I'm confused and dubious: how is it that the guiding goals for this effort are all about form rather than function?  Surely, if you intend for the result to serve any particular function(s) then you need to be guided at least in part by which functions those are.  Perhaps for you that's tied up in what you mean by "a type and provider API", but I am confident that some of the details of what that means to you differ from what it means to others.  You really need to be explicit with this if you want good feedback.  Moreover, inasmuch as you seem to imagine this API eventually supplanting the existing one, it may save a lot of future grief and expense to make sure everyone is on the same page early on.

Thanks for pointing out where things didn't make it out of my head into this text.  Part of the motivation is hidden in the note around libral. There are a few connected efforts/conversations around making the providers a independent entity that can provide value in more situations than the agent transaction:

* libral is addressing the local runtime environment
* this proposal is more focused on the shape of the data required to "drive" a provider
* remote resources (e.g. AWS, network devices) are becoming more prevalent and might be better served by something other than puppet agent. For example puppet device is a thing, but is not well integrated in the general ecosystem

The underlying connection is that the functionality of providers (e.g. "how to install a package", or "how to procure a VM in AWS") have a value beyond "can be driven in a puppet agent transaction. Easy examples are "puppet resource" as a exploratory tool, and "puppet device" for remote resources. Gareth's puppetlabs-inventory , everyone hooking into PuppetDB, and by extensions efforts like augeas, libelektra, openlmi all were about providing uniform API access to heterogeneous sets of resources.
 
Moreover, none of the goals given are addressed the needs of the API client, which is Puppet itself.  Or do you have a different concept of who the client is?  If so, then you and I have even more different ideas of what a "type and provider API" is than I had supposed.

As argued above, there is value to be unlocked by enabling callers of the Resource API other than puppet agent. That said, the puppet agent will in any reasonable future remain the primary driver for all of this, which also leads to the balancing act of providing a easily callable API, that still hooks up into the current infrastructure.

 

Furthermore, as wonderful as "a smooth and simple ruby developer experience for both scripters and coders" sounds, I think it's utterly unrealistic.  People who aren't good at writing native code or who just want to avoid it have much different wants and expectations from those who are comfortable with writing native code.  In fact, Puppet already has support for "scripters": this is what defined types are all about.  People who don't want to be writing Ruby are unlikely ever to have a smooth or simple Ruby developer experience.  The most likely outcome of trying to simplify any more than serves coders well is producing something  insufficiently powerful / flexible.

In my book scripters are smart people who build upon existing code snippets (be that from existing pieces, or documentation), while coders take a more formal approach revolving around exploring a wider swatch of the surrounding infrastructure and principles. My hope is that a runtime model that can be fully explained in a few sentences (see above), and is more inspectable will help both groups to be able to focus on the real problems they are tasked to solve.

The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 



So there is no mechanism for obtaining the current state of an individual resource?  That seems a bit of an oversight.  There are resource types that cannot feasibly be fully enumerated (e.g. files), and even resource types that cannot be enumerated at all.

As mentioned in the original text: "Some resources (e.g. file, ssh_authorized_keys, concat) cannot or should not be prefetched. While it might not be convenient, a provider could always return nothing on the get() and do a more customized enforce motion in the set()." This is a deliberate tradeoff now, to build a simple, robust foundation for future evolution. Adding an optional `find_by_name` function that can do this.


Also, though you say elsewhere that the provider's set() method receives a list of resources to update, that is not borne out by your example code.

The example code only shows the Resource itself. The scaffolding around it to hook it up into the existing system does not exist yet. On my last flight I wrote a very simple type and provider to forward puppet's API into calls of get/set functions as they are described here, and that worked surprisingly well, so I'm confident now that I can really implement something to dynamically create this scaffolding based on the resource definition.
 
The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.



I'm not at all convinced that your concept of a resource type definition indeed covers everything needed -- or at least wanted -- on the server side.  For example, I'm all for engaging the type system, but
  1. The type system does not cover all potential validation needs.  In particular, it cannot perform joint validation on multiple parameters.
  2. Using the type system for any but pretty simple validity checks will run against your objective of providing a simple interface.

The first point is true already for most existing non-data-type validations, except for the example you brought up. The second point is the reason I don't propose doing that.
 
Also, if handling types does not involve any server-side code, does that mean that the type system is not engaged to validate parameters until the data reach the client?  If so, that in itself is a controversial interpretation of what is and is not needed on the server side.

No server-side code provided by the developer of the Resource. Of course the puppet compiler will be able to cross-check the types from the define() call with whatever is passed in through the manifest.
 

I'm not prepared to offer feedback on the details at this time, because I don't think you've sufficiently established what the effort is trying to accomplish.


Thank you for your time and work. I really appreciate the honest and detailed feedback!


David

PS: I'm travelling this and next week. While it might take me a day or two to get around to answering, I will process all mails before any decision on this project is made.

David Schmitt

unread,
Feb 27, 2017, 9:59:57 AM2/27/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Hi folks,

I've written down the design in README format now. This should make for a easier to ingest from a module developer's perspective. Meanwhile I've also had a shot at a hardcoded implementation of this, and it is technically possible to implement this too.

-----------

Resource API

A resource is the basic thing that is managed by puppet. Each resource has a set of attributes describing its current state. Some of the attributes can be changed throughout the life-time of the resource, some attributes only report back, but cannot be changed (see read_only)others can only be set once during initial creation (see init_only). To gather information about those resources, and to enact changes in the real world, puppet requires a piece of code to implement this interaction. The implementation can have parameters that influence its mode of operation (see operational_parameters). To describe all these parts to the infrastructure, and the consumers, the resource Definition (f.k.a. 'type') contains definitions for all of them. The Implementation (f.k.a. 'provider') contains code to get and set the system state.

Resource Definition

Puppet::ResourceDefinition.register(
    name: 'apt_key',
    docs: <<-EOS,
      This type provides Puppet with the capabilities to manage GPG keys needed
      by apt to perform package validation. Apt has it's own GPG keyring that can
      be manipulated through the `apt-key` command.

      apt_key { '6F6B15509CF8E59E6E469F327F438280EF8D349F':
        source => 'http://apt.puppetlabs.com/pubkey.gpg'
      }

      **Autorequires**:
      If Puppet is given the location of a key file which looks like an absolute
      path this type will autorequire that file.
    EOS
    attributes:   {
        ensure:      {
            type: 'Enum[present, absent]',
            docs: 'Whether this apt key should be present or absent on the target system.'
        },
        id:          {
            type:    'Variant[Pattern[/\A(0x)?[0-9a-fA-F]{8}\Z/], Pattern[/\A(0x)?[0-9a-fA-F]{16}\Z/], Pattern[/\A(0x)?[0-9a-fA-F]{40}\Z/]]',
            docs:    'The ID of the key you want to manage.',
            namevar: true,
        },
        # ...
        created:     {
            type:      'String',
            docs:      'Date the key was created, in ISO format.',
            read_only: true,
        },
    },
    autorequires: {
        file:    '$source', # will evaluate to the value of the `source` attribute
        package: 'apt',
    },
)

The Puppet::ResourceDefinition.register(options) function takes a Hash with the following top-level keys:

  • name: the name of the resource. For autoloading to work, the whole function call needs to go into lib/puppet/type/<name>.rb.
  • docs: a doc string that describes the overall working of the type, gives examples, and explains pre-requisites as well as known issues.
  • attributes: an hash mapping attribute names to their details. Each attribute is described by a hash containing the puppet 4 data type, a docs string, and whether the attribute is the namevar, read_only, and/or init_only.
  • operational_parameters: a hash mapping parameter names to puppet data types and docs strings.
  • autorequires, autobefore, autosubscribe, and autonotify: a Hash mapping resource types to titles. Currently the titles must either be constants, or, if the value starts with a dollar sign, a reference to the value of an attribute.

Resource Implementation

At runtime the current and intended system states for a single resource instance are always represented as ruby Hashes of the resource's attributes, and applicable operational parameters.

The two fundamental operations to manage resources are reading and writing system state. These operations are implemented in the ResourceImplementation as get and set:

Puppet::ResourceImplementation.register('apt_key') do
  def get
    [
      'title': {
        name: 'title',
        # ...
      },
    ]
  end

  def set(current_state, target_state, noop: false)
    target_state.each do |title, resource|
      # ...
    end
  end
end

The get method returns a Hash of all resources currently available, keyed by their title. If the get method raises an exception, the implementation is marked as unavailable during the current run, and all resources of its type will fail. The error message will be reported to the user.

The set method updates resources to the state defined in target_state. For convenience, current_state contains the last available system state from a prior get call. When noop is set to true, the implementation must not change the system state, but only report what it would change.

Runtime Environment

The primary runtime environment for the implementation is the puppet agent, a long-running daemon process. The implementation can also be used in the puppet apply command, a one-shot version of the agent, or the puppet resource command, a short-lived CLI process for listing or managing a single resource type. Other callers who want to access the implementation will have to emulate those environments. In any case the registered block will be surfaced in a clean class which will be instantiated once for each transaction. The implementation can define any number of helper methods to support itself. To allow for a transaction to set up the prerequisites for an implementation, and use it immediately, the provider is instantiated as late as possible. A transaction will usually call get once, and may call set any number of times to effect change. The host object can be used to cache ephemeral state during execution. The implementation should not try to cache state beyond the transaction, to avoid interfering with the agent daemon. In many other cases caching beyond the transaction won't help anyways, as the hosting process will only manage a single transaction.

Utilities

The runtime environment provides some utilities to make the implementation's life easier, and provide a uniform experience for its users.

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary.

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

apt_key 'del', key_id

By default the stdout of the command is logged to debug, while the stderr is logged to warning. To access the stdout in the implementation, use the command name with _lines appended, and process it through the returned Enumerable line-by-line. For example, to process the list of all apt keys:

apt_key_lines(%w{adv --list-keys --with-colons --fingerprint --fixed-list-mode}).collect do |line|
  # process each line here, and return a result
end

Note: the output of the command is streamed through the Enumerable. If the implementation requires the exit value of the command before processing, or wants to cache the output, use to_a to read the complete stream in one go.

If the command returns a non-zero exit code, an error is signalled to puppet. If this happens during get, all managed resources of this type will fail. If this happens during a set, all resources that have been scheduled for processing in this call, but not yet have been marked as a success will be marked as failed. To avoid this behaviour, call the try_ prefix variant. In this (hypothetical) example, apt-key signals already deleted keys with an exit code of 1, which is OK when the implementation is trying to delete the key:

try_apt_key 'del', key_id

if [0, 1].contains $?.exitstatus
  # success, or already deleted
else
  # fail
end

The exit code is signalled through the ruby standard variable $? as a Process::Status object

Logging and Reporting

The implementation needs to signal changes, successes and failures to the runtime environment. The logger provides a structured way to do so.

General messages

To provide feedback about the overall operation of the implementation, the logger provides the usual set of loglevel methods that take a string, and pass that up to puppet's logging infrastructure:

logger.warning("Unexpected state detected, continuing in degraded mode.")

will result in the following message:

Warning: apt_key: Unexpected state detected, continuing in degraded mode.
  • debug: detailed messages to understand everything that is happening at runtime; only shown on request
  • info: high level progress messages; especially useful before long-running operations, or before operations that can fail, to provide context to interactive users
  • notice: use this loglevel to indicate state changes and similar events of notice from the regular operations of the implementation
  • warning: signal error conditions that do not (yet) prohibit execution of the main part of the implementation; for example deprecation warnings, temporary errors.
  • err: signal error conditions that have caused normal operations to fail
  • critical/alert/emerg: should not be used by resource implementations

See wikipedia and RFC424 for more details.

Logging contexts

Most of an implementation's messages are expected to be relative to a specific resource instance, and a specific operation on that instance. For example, to report the change of an attribute:

logger.attribute_changed(title:, attribute:, old_value:, new_value:, message: "Changed #{attribute} from #{old_value.inspect} to #{newvalue.inspect}")

To enable detailed logging without repeating key arguments, and provide consistent error logging, the logger provides logging context methods that capture the current action and resource instance.

logger.updating(title: title) do
  if key_not_found
    logger.warning('Original key not found') 
  end

  # Update the key by calling CLI tool
  apt_key(...)

  logger.attribute_changed(
    attribute: 'content', 
    old_value: nil,
    new_value: content_hash, 
    message: "Created with content hash #{content_hash}")
end

will result in the following messages (of course, with the #{} sequences replaced by the true values):

Debug: Apt_key[#{title}]: Started updating
Warning: Apt_key[#{title}]: Updating: Original key not found
Debug: Apt_key[#{title}]: Executing 'apt-key ...'
Debug: Apt_key[#{title}]: Successfully executed 'apt-key ...'
Notice: Apt_key[#{title}]: Updating content: Created with content hash #{content_hash}
Notice: Apt_key[#{title}]: Successfully updated
# TODO: update messages to match current log message formats for resource messages

In the case of an exception escaping the block, the error is logged appropriately:

Debug: Apt_key[#{title}]: Started updating
Warning: Apt_key[#{title}]: Updating: Original key not found
Error: Apt_key[#{title}]: Updating failed: #{exception message}
# TODO: update messages to match current log message formats for resource messages

Logging contexts process all exceptions. StandardErrors are assumed to be regular failures in handling a resources, and they are swallowed after logging. Everything else is assumed to be a fatal application-level issue, and is passed up the stack, ending execution. See the ruby documentation for details on which exceptions are not StandardErrors.

The equivalent long-hand form with manual error handling:

logger.updating(title: title)
begin
  if key_not_found
    logger.warning(title: title, message: 'Original key not found')
  end

  # Update the key by calling CLI tool
  try_apt_key(...)

  if $?.exitstatus != 0
    logger.error(title: title, "Failed executing apt-key #{...}")
  else
    logger.attribute_changed(
      title:     title,
      attribute: 'content',
      old_value: nil,
      new_value: content_hash,
      message:   "Created with content hash #{content_hash}")
  end
  logger.changed(title: title)
rescue StandardError => e
  logger.error(title: title, exception: e, message: 'Updating failed')
  raise unless e.is_a? StandardError
end

This example is only for demonstration purposes. In the normal course of operations, implementations should always use the utility functions.

Logging reference

The following action/context methods are available:

  • creating(title:, message: 'Creating', &block)
  • updating(title:, message: 'Updating', &block)
  • deleting(title:, message: 'Deleting', &block)
  • attribute_changed(title:, attribute:, old_value:, new_value:, message: nil)

  • created(title:, message: 'Created')

  • updated(title:, message: 'Updated')
  • deleted(title:, message: 'Deleted')

  • fail(title:, message:) - abort the current context with an error


--------------------



Regards, David

 

David Schmitt

unread,
Feb 28, 2017, 10:59:39 AM2/28/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io


On 27 February 2017 at 18:57, Shawn Ferry <shawn...@oracle.com> wrote:

On Feb 27, 2017, at 9:59 AM, David Schmitt <david....@puppet.com> wrote:

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary. 

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

If the goal here is safe execution shouldn’t it escape at least shell special characters? My concern is based on some of the providers I work on which take free form strings as arguments and any type that doesn’t fully validate or munge inputs. 

There is a risk that any provider* today can suffer from shell command injection. It seems that using shelljoin on the argument array would be useful or shellesacpe on each element. However, for a simplified API it seems that it could be done more centrally here instead of pushing it to each implementation to make sure it is done if needed.

* That can’t exclude shell characters and doesn’t munge input to escape them. 

The trick is to not use system() which calls through the shell, but the array variant of spawn() or popen() that doesn't go through a shell. Not only will that be much better performing due to fewer binaries being loaded, but also eliminates all shell parsing from the call chain. Having this wrapped up in the API as the primary mechanism to call CLI commands is intended to help people with that, instead of having them have to find the right ruby incantation themselves.



Cheers, David

Erik Dalén

unread,
Feb 28, 2017, 2:35:59 PM2/28/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHZ-PYgrqboexxDJzzLPgbxidu%2BZs%3DRYHyc85LBOnr4EfA%40mail.gmail.com.

Trevor Vaughan

unread,
Mar 1, 2017, 9:00:56 AM3/1/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Additional comments in response to Erik

On Tue, Feb 28, 2017 at 2:35 PM, Erik Dalén <erik.gus...@gmail.com> wrote:
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

So, this is a fundamental failure of the File resource being too tightly coupled to the Object model of Puppet. We could, relatively easily, get around this but it would require working around the Object model to provide point optimization.

I'm starting to notice that a lot of people are now passing around hashes to prevent spawning a lot of resources and I think that this fundamental issue should be addressed prior to the implementation of another Type/Provider subsystem. It is possible that another system will *need* to be provided to fully address these issues and that would be a change that I could wholeheartedly get behind.
 

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

Yes, this is very much needed. The ability that the community has basically shoved into Puppet in regards to catalog delving is one of the things that helps Puppet remain relevant across various environments.
 

It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

I've had to have trailing resource collectors (as does concat) that simply exist to graft together a bunch of previously declared resources. This really needs to be able to be eliminated and we need to be able to mark resources as irrelevant to the client so that the data, in specific, can not bloat the catalog. This *should* be relatively straightforward with an extension to the existing object model but I haven't had time to try it.
 

logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

Agreed
 

Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

I still think that types just aren't enough at this point without arbitrary validation capabilities, and the ability to not repeat my type definitions all over the place.
 

Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

100% need this. If I can't do this, most of the advanced functionality that I have dies. The catalog is one of the shining points in Puppet's favor in terms of monitoring and compliance. I *need* the ability to dig into it as necessary and I would love to see that ability codified instead of having to re-hack my way into it when it changes. Also, this is something that I need at run time. Waiting until it's stuffed in a random DB somewhere is too late.

From the top of my head, I use this for: Concat, IPTables, CGroups, Kerberos, and Compliance Validation

Losing this would be a *BIG DEAL*
 

Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

I also use this quite a bit. I know that it's not as easy to debug but it is occasionally necessary to maintain the proper object model and not resort to hacks that may conflict with other resource declarations.
 

Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?

I'm honestly not so concerned about this. I used to be all for running my catalogs in parallel but I quickly realized that the slight speed gains that I might get (I'll probably be blocked by I/O somewhere anyway) would probably not be worth the headache of trying to debug a parallel catalog gone bad. In my mind, parallelism in Puppet should be thought of at the cross-node level and the catalogs should moved to a direct heap/stack implementation to reduce resource usage on the clients and to gain efficiency.
 

To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CAAAzDLcHRk0crAwcXHsP-2QS6Bv1OVHGxQV8_-QoOUnf%2BDvoQg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

David Schmitt

unread,
Mar 7, 2017, 2:38:39 PM3/7/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
[aologies, this got lost in my Drafts folder.]

On 28 February 2017 at 19:35, Erik Dalén <erik.gus...@gmail.com> wrote:
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

Yup it should. I've added a short discussion of the problem to the "Known limitations" section. I could add verbiage around the get() method returning nil (as opposed to []) to signal that there might be instances available, but it's not possible to enumerate them.
 
It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

True. Porting the existing types and providers from our supported modules is high on my list to validate the design, and will expose the patterns. That'll make it much easier to build a library of helpers.
 
logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

That's already specified.
 
Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

Currently anywhere "puppet 4 data types" are mentioned, only the built-in types are usable. This is because the type information is required on the agent, but puppet doesn't make it available yet. This work is tracked in [PUP-7197](https://tickets.puppetlabs.com/browse/PUP-7197), but even once that is implemented, modules will have to wait until the functionality is widely available, before being able to rely on that.
 
Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

 My preferred method to solve that would be a new top-level plugin, that can inspect and change the catalog after it has been built. That plugin could then collect all the fragments, and build a file resource from them, before the catalog is even passed to the agent.


Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

Adding more discrete functionality and life-cycle methods later seems like an easy extension, that will not require a major API revision. Also possibly another use-case for a catalog transform plugin.
 
Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?


Theoretically, those implementations could be run in a separate process. Looking at things like Lutterkort's libral, this might happen quicker than I think today. As a firm believer of small incremental steps, I only want to speculate on those things as far as I need to avoid blocking a future change. Therefore I'm focusing this new API mostly on being small, self contained, and decoupled.


Thanks for your time and work in reviewing this!

Regards, David


Reply all
Reply to author
Forward
0 new messages