Draft for new type and provider API

159 views
Skip to first unread message

David Schmitt

unread,
Jan 31, 2017, 11:04:19 AM1/31/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi *,

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2). Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.

To showcase my vision, this [gist](https://gist.github.com/DavidS/430330ae43ba4b51fe34bd27ddbe4bc7) has the [apt_key type](https://github.com/puppetlabs/puppetlabs-apt/blob/master/lib/puppet/type/apt_key.rb) and [provider](https://github.com/puppetlabs/puppetlabs-apt/blob/master/lib/puppet/provider/apt_key/apt_key.rb) ported over to my proposal. The second example there is a more long-term teaser on what would become possible with such an API.

The new API, like the existing, has two parts: the implementation that interacts with the actual resources, a.k.a. the provider, and information about what the implementation is all about. Due to the different usage patterns of the two parts, they need to be passed to puppet in two different calls:

The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 

The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.
 

Details in no particular order:

* All of this should fit on any unmodified puppet4 installation. It is completely additive and optional. Currently.

* The Type definition
  * It is data-only.
  * Refers to puppet data types.
  * No code runs on the server.
  * This information can be re-used in all tooling around displaying/working with types (e.g. puppet-strings, console, ENC, etc.).
  * autorelations are restricted to unmodified attribute values and constant values.
  * No more `validate` or `munge`! For the edge cases not covered by data types, runtime checking can happen in the implementation on the agent. There it can use local system state (e.g. different mysql versions have different max table length constraints), and it will only fail the part of the resource tree, that is dependent on this error. There is already ample precedent for runtime validation, as most remote resources do not try to replicate the validation their target is already doing anyways.
  * It maps 1:1 to the capabilities of PCore, and is similar to the libral interface description (see [libral#1](https://github.com/puppetlabs/libral/pull/2)). This ensures future interoperability between the different parts of the ecosystem.
  * Related types can share common attributes by sharing/merging the attribute hashes.
  * `defaults`, `read_only`, and similar data about attributes in the definition are mostly aesthetic at the current point in time, but will make for better documentation, and allow more intelligence built on top of this later.

* The implementation are two simple functions `current_state = get()`, and `set(current_state, target_state, noop)`.
  * `get` on its own is already useful for many things, like puppet resource.
  * `set` receives the current state from `get`. While this is necessary for proper operation, there is a certain race condition there, if the system state changes between the calls. This is no different than what current implementations face, and they are well-equipped to deal with this.
  * `set` is called with a list of resources, and can do batching if it is beneficial. This is not yet supported by the agent.
  * the `current_state` and `target_state` values are lists of simple data structures built up of primitives like strings, numbers, hashes and arrays. They match the schema defined in the type.
  * Calling `r.set(r.get, r.get)` would ensure the current state. This should run without any changes, proving the idempotency of the implementation.
  * The ruby instance hosting the `get` and `set` functions is only alive for the duration of an agent transaction. An implementation can provide a `initialize` method to read credentials from the system, and setup other things as required. The single instance is used for all instances of the resource.
  * There is no direct dependency on puppet core libraries in the implementation.
    * While implementations can use utility functions, they are completely optional.
    * The dependencies on the `logger`, `commands`, and similar utilities can be supplied by a small utility library (TBD).

* Having a well-defined small API makes remoting, stacking, proxying, batching, interactive use, and other shenanigans possible, which will make for a interesting time ahead.

* The logging of updates to the transaction is only a sketch. See the usage of `logger` throughout the example. I've tried different styles for fit.
  * the `logger` is the primary way of reporting back information to the log, and the report.
  * results can be streamed for immediate feedback
  * block-based constructs allow detailed logging with little code ("Started X", "X: Doing Something", "X: Success|Failure", with one or two calls, and only one reference to X)
 
* Obviously this is not sufficient to cover everything existing types and providers are able to do. For the first iteration we are choosing simplicity over functionality.
  * Generating more resource instances for the catalog during compilation (e.g. file#recurse or concat) becomes impossible with a pure data-driven Type. There is still space in the API to add server-side code.
  * Some resources (e.g. file, ssh_authorized_keys, concat) cannot or should not be prefetched. While it might not be convenient, a provider could always return nothing on the `get()` and do a more customized enforce motion in the `set()`.
  * With current puppet versions, only "native" data types will be supported, as type aliases do not get pluginsynced. Yet.
  * With current puppet versions, `puppet resource` can't load the data types, and therefore will not be able to take full advantage of this. Yet.

* There is some "convenient" infrastructure (e.g. parsedfile) that needs porting over to this model.

* Testing becomes possible on a completely new level. The test library can know how data is transformed outside the API, and - using the shape of the type - start generating test cases, and checking the actions of the implementation. This will require developer help to isolate the implementation from real systems, but it should go a long way towards reducing the tedium in writing tests.


What do you think about this?


Cheers, David


Trevor Vaughan

unread,
Feb 1, 2017, 9:50:52 PM2/1/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

Most of this looks fine (I still need to digest some of it) but I don't know that I'm a huge fan of no munge or validate functionality. This seems to fly in the face of a 'fail fast' mentality.

Also, if everything happens on the client, some of the catalog munging that I do on the fly isn't going to work any longer.

I *do* like the idea of types being pure data but will this bloat the catalog even further?

I still think that moving toward a heap/stack model would serve the Puppet clients best and get past the resource count limitations.

Fundamentally, I want my clients to do as little work as possible since they're meant to be running business processes.

I'll definitely be keeping an eye on this conversation though.

Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHaJdvPrkqRQEMqEgLSUvOy-O4DuL-iNsLrPt74HY7djvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc

-- This account not approved for unencrypted proprietary information --

David Schmitt

unread,
Feb 2, 2017, 8:42:16 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 2 February 2017 at 02:50, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
Hi David,

Most of this looks fine (I still need to digest some of it) but I don't know that I'm a huge fan of no munge or validate functionality. This seems to fly in the face of a 'fail fast' mentality.

With puppet 4 data types much of the trivial stuff munge and validate do can be handled much easier and consistently than before. Many other checks will never be possible on the server, because they depend on values only available on the agent (credential validity, max tablename length dependent on mysql flavour), or would be error prone duplication of things the target API does check for us (validity of file mode bit combinations, validity of AMI references when talking to EC2 instances, etc). There is a bit of stuff in-between those two (e.g. content&source specified on a file), which we'd trade-off for a much simpler server side. Failing on the agent, also means that independent parts of the catalog can move forward, instead of failing the whole compilation.
 
Also, if everything happens on the client, some of the catalog munging that I do on the fly isn't going to work any longer.

Those resources will not look anything different than existing types (which will stay as they are for a considerable while). Since I don't know what catalog munging you're doing I'm having a hard time following here.

I *do* like the idea of types being pure data but will this bloat the catalog even further?

The resources in the catalog won't look any different than today. In fact, the first version will most likely just call Puppet::Type.newtype() under the hood somewhere.
 
I still think that moving toward a heap/stack model would serve the Puppet clients best and get past the resource count limitations.

Do you have those ideas written up somewhere?
 
Fundamentally, I want my clients to do as little work as possible since they're meant to be running business processes.

Same.
 

I'll definitely be keeping an eye on this conversation though.

Thanks,

Thanks for your time and work you're putting in here.


David
 

Trevor Vaughan

unread,
Feb 2, 2017, 10:38:30 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
I think I'm slowly talking myself into this.

The issue that I've had is that the current Data Types are composable. There's a ticket open about it and hopefully that will fix a lot of my issues.

Unfortunately, it looks like my heap/stack discussion was with Henrik on Slack, so that's not super helpful.

Basically, the idea is that, instead of using a pure object model, objects would be created *as necessary* and the linear execution model would be kept in a simple array (stack). The data space (heap) would contain the attributes that are required for each object to execute.

This means that the data structure would be extremely efficient and there would be almost no cost to building and manipulating an object graph.

The benefit of the object graph is to keep the object model. However, the object model is simply not necessary and has proven not to scale to 10k+ resources. The Heap/Stack model should scale based on the amount of data that you have and would be *extremely* portable to other languages.

The biggest down side (for me) is that I manipulate the catalog graph at run time for various reasons. There would probably be an easy way to work this into the API though since it would just be an array insertion.

In terms of the validate/munge question, munging can *absolutely* be done client side. But do you even want to pass a catalog to the client if you have invalid data that might be used by other resources. Also, I do quite a bit of checking based on the values of other parameters/properties. If this could be maintained reasonably, that would be fine.

I do think that having a global data space would be extremely important to making this feasible. There are a couple of tickets on for this as well but I don't know the numbers off hand. And, of course, if we use the heap/stack model, the data is all in place (but may change based on resource processing).

Hopefully this all makes sense.

Thanks,

Trevor


For more options, visit https://groups.google.com/d/optout.

David Schmitt

unread,
Feb 2, 2017, 11:19:56 AM2/2/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 2 February 2017 at 15:38, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
I think I'm slowly talking myself into this.

The issue that I've had is that the current Data Types are composable. There's a ticket open about it and hopefully that will fix a lot of my issues.

Assuming you're talking about PUP-7033, there is nothing much I can help you with there.


Unfortunately, it looks like my heap/stack discussion was with Henrik on Slack, so that's not super helpful.

Basically, the idea is that, instead of using a pure object model, objects would be created *as necessary* and the linear execution model would be kept in a simple array (stack). The data space (heap) would contain the attributes that are required for each object to execute.

This means that the data structure would be extremely efficient and there would be almost no cost to building and manipulating an object graph.

The benefit of the object graph is to keep the object model. However, the object model is simply not necessary and has proven not to scale to 10k+ resources. The Heap/Stack model should scale based on the amount of data that you have and would be *extremely* portable to other languages.

The agent requires the whole graph for its operation. Since the v1 of this will only build on top of the existing APIs, I expect the (memory) situation to get worse before it'll get better. The new API itself doesn't need a huge object model for its operation. The resources are passed around as simple arrays and hashes. That could be much more compact than the current glut of Puppet::Provider instances.
 
The biggest down side (for me) is that I manipulate the catalog graph at run time for various reasons. There would probably be an easy way to work this into the API though since it would just be an array insertion.

Yeah, things like file#recurse, or what concat_file does with `generate` would not be possible with this new API. That is a deliberate decision to skip a complex fringe feature. I'd love to see what you're doing with it, to get a better handle of how to re-introduce this in a more pleasant form. Within the next few years, you'll still be able to just use the puppet 2 T&P API as-is, if you need to.
 
In terms of the validate/munge question, munging can *absolutely* be done client side. But do you even want to pass a catalog to the client if you have invalid data that might be used by other resources. Also, I do quite a bit of checking based on the values of other parameters/properties. If this could be maintained reasonably, that would be fine.

The hard and generic case of error handling is dynamic errors during application of the catalog. Puppet will always have to deal with those gracefully. Are there some validations that could be done earlier than during the update? Absolutely. How much does it buy us, though? If you have pointers to cool things you're currently doing, I'd be very interested in seeing those. No worries about timing. Due to travels, this proposal won't go anywhere in the next two weeks. And I'd rather have the full picture anyways.
 
I do think that having a global data space would be extremely important to making this feasible. There are a couple of tickets on for this as well but I don't know the numbers off hand. And, of course, if we use the heap/stack model, the data is all in place (but may change based on resource processing).

Having a proper place to do global transformations on the catalog both agent and server side would be beneficial to many advanced use-cases. As I said above, the current way to do so will not go away anytime soon. This Resource API just won't make it any less painful.
 

Hopefully this all makes sense.

Very much so. I hope the same.


Cheers, David

Martin Alfke

unread,
Feb 3, 2017, 6:12:22 AM2/3/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

many thanks for sharing your ideas.

As you already know: personally I don’t say that types and providers are way to difficult.
It is more a matter on how to explain them.
I will dig through your document during weekend. We can have a chat on Monday in Ghent.

But besides this:
is your idea somehow related to the libral project?

Best,
Martin


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

David Schmitt

unread,
Feb 3, 2017, 7:23:50 AM2/3/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 3 February 2017 at 11:12, Martin Alfke <tux...@gmail.com> wrote:
Hi David,

many thanks for sharing your ideas.

As you already know: personally I don’t say that types and providers are way to difficult.
It is more a matter on how to explain them.
I will dig through your document during weekend. We can have a chat on Monday in Ghent.

But besides this:
is your idea somehow related to the libral project?


 
  * It maps 1:1 to the capabilities of PCore, and is similar to the libral interface description (see [libral#1](https://github.com/puppetlabs/libral/pull/2)). This ensures future interoperability between the different parts of the ecosystem.

libral is a neighbouring project, with its own set of distinct design goals. Staying bidirectionally compatible is a necessary requisite for this proposal here. (Which can be handled on both sides, depending on a wide variety of things).

Cheers, David

Trevor Vaughan

unread,
Feb 3, 2017, 8:01:08 PM2/3/17
to puppe...@googlegroups.com, puppet...@googlegroups.com
I'm still thinking about a lot of this but one thing that I will ask is that you don't drop the actual compiled catalog (in whatever form).

One of the reasons that Puppet is good for compliance focused architectures is based on the fact that I can easily hook it to validate input on the fly


Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHb9b7HknTGrhMbu5213kpMX3rHg-nwiOKr2jvcqnzTh-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Martin Alfke

unread,
Feb 5, 2017, 8:35:17 AM2/5/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi David,

I found some time digging into your proposal.
It seems as if we are following completely different approaches. I hope that my email will not upset or offend you.

Usually I hear lot of people complaining, that types and providers are the most complex thing todo with Puppet.
Especially since the DSL is easy to read and understand.
When I thought about “making types and providers more easy to use” I always think about the simplicity of Puppet DSL and never thought about a new ruby API.

I am happy to chat with you tomorrow. But I don’t see how Types and Providers will be more easy using your proposed API.
It seems to me as if you are “hiding” some things which are hard to understand - similar to mk_resource_methods which is useful when following some silent assumptions.
I always prefer Puppet to be as simple as possible which means: do everything as obvious as possible.

Different to your proposal I thought whether it would be a possible way to reuse Puppet DSL for types and providers.
Maybe it is a silly idea. I am interested in your opinion.

Think of something like the following (yes, I know, “type” is already a reserved word):

type my_module::my_file (
Enum['file', 'absent' ] $ensure,
Stdlib::Absolute_path $path = $title,
String $owner,
){
def create {
exec { "create file ${path}":
command => "touch ${path}",
path => '/usr/bin:/bin',
creates => $path,
}
}
def destroy {
file { $path:
ensure => absent,
}
}
property owner {
exec { "chown ${owner} on ${path}":
command => "chown ${owner} ${path}",
path => '/bin:/usr/bin',
unless => '',
}
}
}

(I know that exec should not be used in this way.)


Always think on what users want to do:

People with ruby experience will not find types and providers difficult (in the way they are done today).

But Puppet was originally written for Ops people who don’t like scripting or coding.
They were happy having a readable, easy to understand DSL instead.
These are the ones which you want to make types/providers more easy.

Looking forward to seeing you tomorrow or later this evening.

Best,
Martin

Corey Osman

unread,
Feb 5, 2017, 3:52:58 PM2/5/17
to Puppet Developers, voxp...@groups.io, puppet...@googlegroups.com
One improvement I would like to see with the provider API is around getting the current state.  I see many beginners including myself trying to add way more complexity than required in the exists? method.

One would think that you in order to figure out if the resource exists that you should just query the system and DIY.  And I'll pick on my own code as an example: https://github.com/logicminds/bmclib/blob/0.1.0/lib/puppet/provider/bmc/ipmitool.rb#L34

However, this is not the case.  This is the exact opposite of what you need to do. Instead you need to query the API which is hard to understand and not obvious at first because instinct tells you to get the state yourself.


Another improvement which I hope would get addressed is self.prefetch.  While self.instances is pretty simple to understand the prefetch method could probably use a better name and an easier way to implement. It was not obvious at first how to implement. So many people don't implement these two methods when they should making their provider more useful. 


The type checking looks pretty amazing and simple to understand and implement in the type code.

Additionally, As Trevor mentioned I also have some projects (puppet-debugger and puppet-retrospec) that utilize the current Puppet API to introspec resources and query the catalog please so make sure this new code doesn't break my projects.   I'll be happy to test out any new API for third party compatibility. 


Corey  (NWOps)

David Lutterkort

unread,
Feb 6, 2017, 5:51:43 AM2/6/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi Martin,

On Sun, Feb 5, 2017 at 2:35 PM, Martin Alfke <tux...@gmail.com> wrote:
 
It seems to me as if you are “hiding” some things which are hard to understand - similar to mk_resource_methods which is useful when following some silent assumptions.

My big hope is that we can do away with all that internal machinery and replace it with a very simple interface between types and providers (roughly: look up existing resources and change resources, what David has as set and get in his proposal)
 
Think of something like the following (yes, I know, “type” is already a reserved word):

type my_module::my_file (
  Enum['file', 'absent' ] $ensure,
  Stdlib::Absolute_path   $path = $title,
  String                  $owner,
){
  def create {
    exec { "create file ${path}":
      command => "touch ${path}",
      path    => '/usr/bin:/bin',
      creates => $path,
    }
  }
  def destroy {
    file { $path:
      ensure =>  absent,
    }
  }
  property owner {
    exec { "chown ${owner} on ${path}":
      command => "chown ${owner} ${path}",
      path    => '/bin:/usr/bin',
      unless  => '',
    }
  }
}

(I know that exec should not be used in this way.)

I think it's totally reasonable for people to write Puppet code that acts like a type/provider; but that's possible today as you could turn the above code into a defined type.

To me, the point of the type/provider mechanism is to allow extending what Puppet can manage beyond what's easily accessible via Puppet. I think that simplification here means that the provider is exposed to a larger-scale change, i.e. is told to enforce a particular resource. Looking, for example, at this systemd provider or this dnf provider  makes me think that breaking change into small pieces (per property) actually makes it harder to write providers, not easier. There's clearly room for adding some language-specific conveniences, but even without that I think the code is pretty straightforward.

People with ruby experience will not find types and providers difficult (in the way they are done today).

But Puppet was originally written for Ops people who don’t like scripting or coding.

Yes, this is a really thorny one - my hunch is that writing types and providers will always be an advanced topic, and it's therefore reasonable to assume some familiarity with scripting.

David

David Schmitt

unread,
Feb 7, 2017, 5:31:13 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Hi Corey,

thanks for looking into this. I've been travelling so accept my apologies for a late answer.


On 5 February 2017 at 20:52, Corey Osman <co...@logicminds.biz> wrote:
One improvement I would like to see with the provider API is around getting the current state.  I see many beginners including myself trying to add way more complexity than required in the exists? method.

One would think that you in order to figure out if the resource exists that you should just query the system and DIY.  And I'll pick on my own code as an example: https://github.com/logicminds/bmclib/blob/0.1.0/lib/puppet/provider/bmc/ipmitool.rb#L34

However, this is not the case.  This is the exact opposite of what you need to do. Instead you need to query the API which is hard to understand and not obvious at first because instinct tells you to get the state yourself.


Another improvement which I hope would get addressed is self.prefetch.  While self.instances is pretty simple to understand the prefetch method could probably use a better name and an easier way to implement. It was not obvious at first how to implement. So many people don't implement these two methods when they should making their provider more useful.

Exactly what this should address. The implementation of `get` is very straightforward (and, currently, required).

The type checking looks pretty amazing and simple to understand and implement in the type code.

Thanks! I hope that a provider author would never have to "implement" this. The SimpleResource will create a proper Type class underneath with all the validation and munging generated.

 

Additionally, As Trevor mentioned I also have some projects (puppet-debugger and puppet-retrospec) that utilize the current Puppet API to introspec resources and query the catalog please so make sure this new code doesn't break my projects.   I'll be happy to test out any new API for third party compatibility. 

Thanks for the offer! For the time being, the implementation will sit completely on top of the unmodified puppet code to work on all puppet 4 versions out there. I also don't expect existing types to switch immediately, but I do hope that the new API makes implementations so much easier that people will switch quickly.

Then it will be a matter of filling in the gaps for the special cases.

Cheers, David S

 

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/95e3ecb5-e796-442f-ae73-7c40b67c6bc7%40googlegroups.com.

Thomas Hallgren

unread,
Feb 7, 2017, 5:44:47 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
Like Martin Afke above, I'm very interested in what could be done using the Puppet Language alone. What use cases can be covered entirely that way? What is not feasible to ever do with PL? Would an API that made it possible to write types and providers PL and hand over tiny tasks to another language (Ruby, C, Go, etc.) be of interest? I.e. a clean cut boundary between the things possible (appropriate) to implement with PL and the things that aren't?

Perhaps this is a first step in that direction?

- thomas


David Schmitt

unread,
Feb 7, 2017, 6:33:52 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On 7 February 2017 at 10:44, Thomas Hallgren <thomas....@puppet.com> wrote:
Like Martin Afke above, I'm very interested in what could be done using the Puppet Language alone. What use cases can be covered entirely that way?

As David L pointed out, many of these things can be done today in defines, or classes, already.

What is not feasible to ever do with PL?

Talk to external APIs.
 
Would an API that made it possible to write types and providers PL and hand over tiny tasks to another language (Ruby, C, Go, etc.) be of interest? I.e. a clean cut boundary between the things possible (appropriate) to implement with PL and the things that aren't?

I do not understand how extending the puppet dsl on the server helps with resource management on the agent.

 
Perhaps this is a first step in that direction?

In one possible reading, types are already are a extension point of the DSL, and this proposal makes that less painful, and easier to reason about.

Cheers, David

PS: If anyone is at ConfigurationManagementCamp in Gent these days, hit me up to talk about this!

 

Thomas Hallgren

unread,
Feb 7, 2017, 7:30:20 AM2/7/17
to puppe...@googlegroups.com, voxp...@groups.io, puppet...@googlegroups.com
On Tue, Feb 7, 2017 at 12:33 PM, David Schmitt <david....@puppet.com> wrote:

I do not understand how extending the puppet dsl on the server helps with resource management on the agent.

A precursor to this, would of course be able to use the parser on the agent and also have an evaluator tailored for the agent. We are already doing this with Pcore types (parse+eval on the agent) which means that types and providers can use them. This could be viewed as a continuation of that work.

 

 
Perhaps this is a first step in that direction?

In one possible reading, types are already are a extension point of the DSL, and this proposal makes that less painful, and easier to reason about.

yes, but it is still a) locked down on Ruby and b) puts the demand on the user to learn an additional language.

- thomas

 

jcbollinger

unread,
Feb 7, 2017, 12:13:44 PM2/7/17
to Puppet Users, puppe...@googlegroups.com, voxp...@groups.io


On Tuesday, January 31, 2017 at 10:04:27 AM UTC-6, David Schmitt wrote:

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2).


Honestly, I've never thought the type / provider system was too bad.  Certainly it works pretty well for simple things.  Now, I'm more of a Dev who does Ops than an Op who Devs, but from my perspective, the biggest problem with Puppet's types & providers has always been inadequate documentation.  The docs in this area are a bit better now than when I started with Puppet around eight years ago, but they have always missed presenting many relevant details about how the host Puppet instance, resource instances, and their associated provider instances interact.  This new effort doesn't seem to be targeting any of that.

 
Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.



I'm confused and dubious: how is it that the guiding goals for this effort are all about form rather than function?  Surely, if you intend for the result to serve any particular function(s) then you need to be guided at least in part by which functions those are.  Perhaps for you that's tied up in what you mean by "a type and provider API", but I am confident that some of the details of what that means to you differ from what it means to others.  You really need to be explicit with this if you want good feedback.  Moreover, inasmuch as you seem to imagine this API eventually supplanting the existing one, it may save a lot of future grief and expense to make sure everyone is on the same page early on.

Moreover, none of the goals given are addressed the needs of the API client, which is Puppet itself.  Or do you have a different concept of who the client is?  If so, then you and I have even more different ideas of what a "type and provider API" is than I had supposed.

Furthermore, as wonderful as "a smooth and simple ruby developer experience for both scripters and coders" sounds, I think it's utterly unrealistic.  People who aren't good at writing native code or who just want to avoid it have much different wants and expectations from those who are comfortable with writing native code.  In fact, Puppet already has support for "scripters": this is what defined types are all about.  People who don't want to be writing Ruby are unlikely ever to have a smooth or simple Ruby developer experience.  The most likely outcome of trying to simplify any more than serves coders well is producing something  insufficiently powerful / flexible.

 
The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 



So there is no mechanism for obtaining the current state of an individual resource?  That seems a bit of an oversight.  There are resource types that cannot feasibly be fully enumerated (e.g. files), and even resource types that cannot be enumerated at all.

Also, though you say elsewhere that the provider's set() method receives a list of resources to update, that is not borne out by your example code.

 
The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.



I'm not at all convinced that your concept of a resource type definition indeed covers everything needed -- or at least wanted -- on the server side.  For example, I'm all for engaging the type system, but
  1. The type system does not cover all potential validation needs.  In particular, it cannot perform joint validation on multiple parameters.
  2. Using the type system for any but pretty simple validity checks will run against your objective of providing a simple interface.
Also, if handling types does not involve any server-side code, does that mean that the type system is not engaged to validate parameters until the data reach the client?  If so, that in itself is a controversial interpretation of what is and is not needed on the server side.

I'm not prepared to offer feedback on the details at this time, because I don't think you've sufficiently established what the effort is trying to accomplish.


John

David Schmitt

unread,
Feb 8, 2017, 9:20:20 AM2/8/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
On 7 February 2017 at 17:13, jcbollinger <John.Bo...@stjude.org> wrote:


On Tuesday, January 31, 2017 at 10:04:27 AM UTC-6, David Schmitt wrote:

The type and provider API has been the bane of my existence since I [started writing native resources](https://github.com/DavidS/puppet-mysql-old/commit/d33c7aa10e3a4bd9e97e947c471ee3ed36e9d1e2).


Honestly, I've never thought the type / provider system was too bad.  Certainly it works pretty well for simple things.  Now, I'm more of a Dev who does Ops than an Op who Devs, but from my perspective, the biggest problem with Puppet's types & providers has always been inadequate documentation.  The docs in this area are a bit better now than when I started with Puppet around eight years ago, but they have always missed presenting many relevant details about how the host Puppet instance, resource instances, and their associated provider instances interact.

What I read here is that the current state is complex enough that a couple of books and a crack team of tech writers couldn't explain it well.
 
  This new effort doesn't seem to be targeting any of that.

Looping back to the "lack of documentation", I see this lack grounded partially in the huge surface area that needs to be covered, since the existing T&P API is deeply entangled into the catalog compilation, and the agent transactions.

This new API has two entry points that take a small number of easily explained arguments, making a full documentation realistic. From this (and other's) reaction I do understand now, that I should have started with a more README-driven approach, to highlight the difference more. Let's try this again:

The SimpleResource (working title) consists of three distinct, but cooperating parts. The Definition describes the shape of a resource, its attributes, their types, which is the namevar, and the doc strings. The get method returns a list of hashes, each matching the shape of the resource as defined in the Definition. For example, a list of apt_key resources (here rendered in JSON) would look like this:

[
  {
    "name": "BBCB188AD7B3228BCF05BD554C0BE21B5FF054BD",
    "ensure": "present",
    "fingerprint": "BBCB188AD7B3228BCF05BD554C0BE21B5FF054BD",
    "long": "4C0BE21B5FF054BD",
    "short": "5FF054BD",
    "size": "2048",
    "type": "rsa",
    "created": "2013-06-07 23:55:31 +0100",
    "expiry": null,
    "expired": false
  },
  {
    "name": "B71ACDE6B52658D12C3106F44AB781597254279C",
    "ensure": "present",
    "fingerprint": "B71ACDE6B52658D12C3106F44AB781597254279C",
    "long": "4AB781597254279C",
    "short": "7254279C",
    "size": "1024",
    "type": "dsa",
    "created": "2007-03-08 20:17:10 +0000",
    "expiry": null,
    "expired": false
  },
  ...


The set method takes a list of target goals in the same format and is responsible for enforcing those goals on the target system. When calling the set method, the caller may provide the current state (as received from a recent get call) to allow the set method to shortcut reading the existing state again. The basic pattern for the set method looks like this:

def set(current_state, target_state, noop = false)
existing_keys = Hash[current_state.collect { |k| [k[:name], k] }]
target_state.each do |resource|
# additional validation for this resource goes here

current = existing_keys[resource[:name]]
if current && resource[:ensure].to_s == 'absent'
# delete the resource
elsif current && resource[:ensure].to_s == 'present'
# update the resource
elsif !current && resource[:ensure].to_s == 'present'
# create the resource
end
end
end


 
Now, finally, we'll do something about it. I'm currently working on designing a nicer API for types and providers. My primary goals are to provide a smooth and simple ruby developer experience for both scripters and coders. Secondary goals were to eliminate server side code, and make puppet 4 data types available. Currently this is completely aspirational (i.e. no real code has been written), but early private feedback was encouraging.



I'm confused and dubious: how is it that the guiding goals for this effort are all about form rather than function?  Surely, if you intend for the result to serve any particular function(s) then you need to be guided at least in part by which functions those are.  Perhaps for you that's tied up in what you mean by "a type and provider API", but I am confident that some of the details of what that means to you differ from what it means to others.  You really need to be explicit with this if you want good feedback.  Moreover, inasmuch as you seem to imagine this API eventually supplanting the existing one, it may save a lot of future grief and expense to make sure everyone is on the same page early on.

Thanks for pointing out where things didn't make it out of my head into this text.  Part of the motivation is hidden in the note around libral. There are a few connected efforts/conversations around making the providers a independent entity that can provide value in more situations than the agent transaction:

* libral is addressing the local runtime environment
* this proposal is more focused on the shape of the data required to "drive" a provider
* remote resources (e.g. AWS, network devices) are becoming more prevalent and might be better served by something other than puppet agent. For example puppet device is a thing, but is not well integrated in the general ecosystem

The underlying connection is that the functionality of providers (e.g. "how to install a package", or "how to procure a VM in AWS") have a value beyond "can be driven in a puppet agent transaction. Easy examples are "puppet resource" as a exploratory tool, and "puppet device" for remote resources. Gareth's puppetlabs-inventory , everyone hooking into PuppetDB, and by extensions efforts like augeas, libelektra, openlmi all were about providing uniform API access to heterogeneous sets of resources.
 
Moreover, none of the goals given are addressed the needs of the API client, which is Puppet itself.  Or do you have a different concept of who the client is?  If so, then you and I have even more different ideas of what a "type and provider API" is than I had supposed.

As argued above, there is value to be unlocked by enabling callers of the Resource API other than puppet agent. That said, the puppet agent will in any reasonable future remain the primary driver for all of this, which also leads to the balancing act of providing a easily callable API, that still hooks up into the current infrastructure.

 

Furthermore, as wonderful as "a smooth and simple ruby developer experience for both scripters and coders" sounds, I think it's utterly unrealistic.  People who aren't good at writing native code or who just want to avoid it have much different wants and expectations from those who are comfortable with writing native code.  In fact, Puppet already has support for "scripters": this is what defined types are all about.  People who don't want to be writing Ruby are unlikely ever to have a smooth or simple Ruby developer experience.  The most likely outcome of trying to simplify any more than serves coders well is producing something  insufficiently powerful / flexible.

In my book scripters are smart people who build upon existing code snippets (be that from existing pieces, or documentation), while coders take a more formal approach revolving around exploring a wider swatch of the surrounding infrastructure and principles. My hope is that a runtime model that can be fully explained in a few sentences (see above), and is more inspectable will help both groups to be able to focus on the real problems they are tasked to solve.

The `Puppet::SimpleResource.implement()` call receives the `current_state = get()` and `set(current_state, target_state, noop)` methods. `get` returns a list of discovered resources, while `set` takes the target state and enforces those goals on the subject. There is only a single (ruby) object throughout an agent run, that can easily do caching and what ever else is required for a good functioning of the provider. The state descriptions passed around are simple lists of key/value hashes describing resources. This will allow the implementation wide latitude in how to organise itself for simplicity and efficiency. 



So there is no mechanism for obtaining the current state of an individual resource?  That seems a bit of an oversight.  There are resource types that cannot feasibly be fully enumerated (e.g. files), and even resource types that cannot be enumerated at all.

As mentioned in the original text: "Some resources (e.g. file, ssh_authorized_keys, concat) cannot or should not be prefetched. While it might not be convenient, a provider could always return nothing on the get() and do a more customized enforce motion in the set()." This is a deliberate tradeoff now, to build a simple, robust foundation for future evolution. Adding an optional `find_by_name` function that can do this.


Also, though you say elsewhere that the provider's set() method receives a list of resources to update, that is not borne out by your example code.

The example code only shows the Resource itself. The scaffolding around it to hook it up into the existing system does not exist yet. On my last flight I wrote a very simple type and provider to forward puppet's API into calls of get/set functions as they are described here, and that worked surprisingly well, so I'm confident now that I can really implement something to dynamically create this scaffolding based on the resource definition.
 
The `Puppet::SimpleResource.define()` call provides a data-only description of the Type. This is all that is needed on the server side to compile a manifest. Thanks to puppet 4 data type checking, this will already be much more strict (with less effort) than possible with the current APIs, while providing more automatically readable documentation about the meaning of the attributes.



I'm not at all convinced that your concept of a resource type definition indeed covers everything needed -- or at least wanted -- on the server side.  For example, I'm all for engaging the type system, but
  1. The type system does not cover all potential validation needs.  In particular, it cannot perform joint validation on multiple parameters.
  2. Using the type system for any but pretty simple validity checks will run against your objective of providing a simple interface.

The first point is true already for most existing non-data-type validations, except for the example you brought up. The second point is the reason I don't propose doing that.
 
Also, if handling types does not involve any server-side code, does that mean that the type system is not engaged to validate parameters until the data reach the client?  If so, that in itself is a controversial interpretation of what is and is not needed on the server side.

No server-side code provided by the developer of the Resource. Of course the puppet compiler will be able to cross-check the types from the define() call with whatever is passed in through the manifest.
 

I'm not prepared to offer feedback on the details at this time, because I don't think you've sufficiently established what the effort is trying to accomplish.


Thank you for your time and work. I really appreciate the honest and detailed feedback!


David

PS: I'm travelling this and next week. While it might take me a day or two to get around to answering, I will process all mails before any decision on this project is made.

David Schmitt

unread,
Feb 27, 2017, 9:59:46 AM2/27/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Hi folks,

I've written down the design in README format now. This should make for a easier to ingest from a module developer's perspective. Meanwhile I've also had a shot at a hardcoded implementation of this, and it is technically possible to implement this too.

-----------

Resource API

A resource is the basic thing that is managed by puppet. Each resource has a set of attributes describing its current state. Some of the attributes can be changed throughout the life-time of the resource, some attributes only report back, but cannot be changed (see read_only)others can only be set once during initial creation (see init_only). To gather information about those resources, and to enact changes in the real world, puppet requires a piece of code to implement this interaction. The implementation can have parameters that influence its mode of operation (see operational_parameters). To describe all these parts to the infrastructure, and the consumers, the resource Definition (f.k.a. 'type') contains definitions for all of them. The Implementation (f.k.a. 'provider') contains code to get and set the system state.

Resource Definition

Puppet::ResourceDefinition.register(
    name: 'apt_key',
    docs: <<-EOS,
      This type provides Puppet with the capabilities to manage GPG keys needed
      by apt to perform package validation. Apt has it's own GPG keyring that can
      be manipulated through the `apt-key` command.

      apt_key { '6F6B15509CF8E59E6E469F327F438280EF8D349F':
        source => 'http://apt.puppetlabs.com/pubkey.gpg'
      }

      **Autorequires**:
      If Puppet is given the location of a key file which looks like an absolute
      path this type will autorequire that file.
    EOS
    attributes:   {
        ensure:      {
            type: 'Enum[present, absent]',
            docs: 'Whether this apt key should be present or absent on the target system.'
        },
        id:          {
            type:    'Variant[Pattern[/\A(0x)?[0-9a-fA-F]{8}\Z/], Pattern[/\A(0x)?[0-9a-fA-F]{16}\Z/], Pattern[/\A(0x)?[0-9a-fA-F]{40}\Z/]]',
            docs:    'The ID of the key you want to manage.',
            namevar: true,
        },
        # ...
        created:     {
            type:      'String',
            docs:      'Date the key was created, in ISO format.',
            read_only: true,
        },
    },
    autorequires: {
        file:    '$source', # will evaluate to the value of the `source` attribute
        package: 'apt',
    },
)

The Puppet::ResourceDefinition.register(options) function takes a Hash with the following top-level keys:

  • name: the name of the resource. For autoloading to work, the whole function call needs to go into lib/puppet/type/<name>.rb.
  • docs: a doc string that describes the overall working of the type, gives examples, and explains pre-requisites as well as known issues.
  • attributes: an hash mapping attribute names to their details. Each attribute is described by a hash containing the puppet 4 data type, a docs string, and whether the attribute is the namevar, read_only, and/or init_only.
  • operational_parameters: a hash mapping parameter names to puppet data types and docs strings.
  • autorequires, autobefore, autosubscribe, and autonotify: a Hash mapping resource types to titles. Currently the titles must either be constants, or, if the value starts with a dollar sign, a reference to the value of an attribute.

Resource Implementation

At runtime the current and intended system states for a single resource instance are always represented as ruby Hashes of the resource's attributes, and applicable operational parameters.

The two fundamental operations to manage resources are reading and writing system state. These operations are implemented in the ResourceImplementation as get and set:

Puppet::ResourceImplementation.register('apt_key') do
  def get
    [
      'title': {
        name: 'title',
        # ...
      },
    ]
  end

  def set(current_state, target_state, noop: false)
    target_state.each do |title, resource|
      # ...
    end
  end
end

The get method returns a Hash of all resources currently available, keyed by their title. If the get method raises an exception, the implementation is marked as unavailable during the current run, and all resources of its type will fail. The error message will be reported to the user.

The set method updates resources to the state defined in target_state. For convenience, current_state contains the last available system state from a prior get call. When noop is set to true, the implementation must not change the system state, but only report what it would change.

Runtime Environment

The primary runtime environment for the implementation is the puppet agent, a long-running daemon process. The implementation can also be used in the puppet apply command, a one-shot version of the agent, or the puppet resource command, a short-lived CLI process for listing or managing a single resource type. Other callers who want to access the implementation will have to emulate those environments. In any case the registered block will be surfaced in a clean class which will be instantiated once for each transaction. The implementation can define any number of helper methods to support itself. To allow for a transaction to set up the prerequisites for an implementation, and use it immediately, the provider is instantiated as late as possible. A transaction will usually call get once, and may call set any number of times to effect change. The host object can be used to cache ephemeral state during execution. The implementation should not try to cache state beyond the transaction, to avoid interfering with the agent daemon. In many other cases caching beyond the transaction won't help anyways, as the hosting process will only manage a single transaction.

Utilities

The runtime environment provides some utilities to make the implementation's life easier, and provide a uniform experience for its users.

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary.

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

apt_key 'del', key_id

By default the stdout of the command is logged to debug, while the stderr is logged to warning. To access the stdout in the implementation, use the command name with _lines appended, and process it through the returned Enumerable line-by-line. For example, to process the list of all apt keys:

apt_key_lines(%w{adv --list-keys --with-colons --fingerprint --fixed-list-mode}).collect do |line|
  # process each line here, and return a result
end

Note: the output of the command is streamed through the Enumerable. If the implementation requires the exit value of the command before processing, or wants to cache the output, use to_a to read the complete stream in one go.

If the command returns a non-zero exit code, an error is signalled to puppet. If this happens during get, all managed resources of this type will fail. If this happens during a set, all resources that have been scheduled for processing in this call, but not yet have been marked as a success will be marked as failed. To avoid this behaviour, call the try_ prefix variant. In this (hypothetical) example, apt-key signals already deleted keys with an exit code of 1, which is OK when the implementation is trying to delete the key:

try_apt_key 'del', key_id

if [0, 1].contains $?.exitstatus
  # success, or already deleted
else
  # fail
end

The exit code is signalled through the ruby standard variable $? as a Process::Status object

Logging and Reporting

The implementation needs to signal changes, successes and failures to the runtime environment. The logger provides a structured way to do so.

General messages

To provide feedback about the overall operation of the implementation, the logger provides the usual set of loglevel methods that take a string, and pass that up to puppet's logging infrastructure:

logger.warning("Unexpected state detected, continuing in degraded mode.")

will result in the following message:

Warning: apt_key: Unexpected state detected, continuing in degraded mode.
  • debug: detailed messages to understand everything that is happening at runtime; only shown on request
  • info: high level progress messages; especially useful before long-running operations, or before operations that can fail, to provide context to interactive users
  • notice: use this loglevel to indicate state changes and similar events of notice from the regular operations of the implementation
  • warning: signal error conditions that do not (yet) prohibit execution of the main part of the implementation; for example deprecation warnings, temporary errors.
  • err: signal error conditions that have caused normal operations to fail
  • critical/alert/emerg: should not be used by resource implementations

See wikipedia and RFC424 for more details.

Logging contexts

Most of an implementation's messages are expected to be relative to a specific resource instance, and a specific operation on that instance. For example, to report the change of an attribute:

logger.attribute_changed(title:, attribute:, old_value:, new_value:, message: "Changed #{attribute} from #{old_value.inspect} to #{newvalue.inspect}")

To enable detailed logging without repeating key arguments, and provide consistent error logging, the logger provides logging context methods that capture the current action and resource instance.

logger.updating(title: title) do
  if key_not_found
    logger.warning('Original key not found') 
  end

  # Update the key by calling CLI tool
  apt_key(...)

  logger.attribute_changed(
    attribute: 'content', 
    old_value: nil,
    new_value: content_hash, 
    message: "Created with content hash #{content_hash}")
end

will result in the following messages (of course, with the #{} sequences replaced by the true values):

Debug: Apt_key[#{title}]: Started updating
Warning: Apt_key[#{title}]: Updating: Original key not found
Debug: Apt_key[#{title}]: Executing 'apt-key ...'
Debug: Apt_key[#{title}]: Successfully executed 'apt-key ...'
Notice: Apt_key[#{title}]: Updating content: Created with content hash #{content_hash}
Notice: Apt_key[#{title}]: Successfully updated
# TODO: update messages to match current log message formats for resource messages

In the case of an exception escaping the block, the error is logged appropriately:

Debug: Apt_key[#{title}]: Started updating
Warning: Apt_key[#{title}]: Updating: Original key not found
Error: Apt_key[#{title}]: Updating failed: #{exception message}
# TODO: update messages to match current log message formats for resource messages

Logging contexts process all exceptions. StandardErrors are assumed to be regular failures in handling a resources, and they are swallowed after logging. Everything else is assumed to be a fatal application-level issue, and is passed up the stack, ending execution. See the ruby documentation for details on which exceptions are not StandardErrors.

The equivalent long-hand form with manual error handling:

logger.updating(title: title)
begin
  if key_not_found
    logger.warning(title: title, message: 'Original key not found')
  end

  # Update the key by calling CLI tool
  try_apt_key(...)

  if $?.exitstatus != 0
    logger.error(title: title, "Failed executing apt-key #{...}")
  else
    logger.attribute_changed(
      title:     title,
      attribute: 'content',
      old_value: nil,
      new_value: content_hash,
      message:   "Created with content hash #{content_hash}")
  end
  logger.changed(title: title)
rescue StandardError => e
  logger.error(title: title, exception: e, message: 'Updating failed')
  raise unless e.is_a? StandardError
end

This example is only for demonstration purposes. In the normal course of operations, implementations should always use the utility functions.

Logging reference

The following action/context methods are available:

  • creating(title:, message: 'Creating', &block)
  • updating(title:, message: 'Updating', &block)
  • deleting(title:, message: 'Deleting', &block)
  • attribute_changed(title:, attribute:, old_value:, new_value:, message: nil)

  • created(title:, message: 'Created')

  • updated(title:, message: 'Updated')
  • deleted(title:, message: 'Deleted')

  • fail(title:, message:) - abort the current context with an error


--------------------



Regards, David

 

Shawn Ferry

unread,
Feb 27, 2017, 1:57:50 PM2/27/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
On Feb 27, 2017, at 9:59 AM, David Schmitt <david....@puppet.com> wrote:

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary. 

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

If the goal here is safe execution shouldn’t it escape at least shell special characters? My concern is based on some of the providers I work on which take free form strings as arguments and any type that doesn’t fully validate or munge inputs. 

There is a risk that any provider* today can suffer from shell command injection. It seems that using shelljoin on the argument array would be useful or shellesacpe on each element. However, for a simplified API it seems that it could be done more centrally here instead of pushing it to each implementation to make sure it is done if needed.

* That can’t exclude shell characters and doesn’t munge input to escape them. 


Thanks
Shawn

David Schmitt

unread,
Feb 28, 2017, 10:59:27 AM2/28/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
On 27 February 2017 at 18:57, Shawn Ferry <shawn...@oracle.com> wrote:

On Feb 27, 2017, at 9:59 AM, David Schmitt <david....@puppet.com> wrote:

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary. 

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

If the goal here is safe execution shouldn’t it escape at least shell special characters? My concern is based on some of the providers I work on which take free form strings as arguments and any type that doesn’t fully validate or munge inputs. 

There is a risk that any provider* today can suffer from shell command injection. It seems that using shelljoin on the argument array would be useful or shellesacpe on each element. However, for a simplified API it seems that it could be done more centrally here instead of pushing it to each implementation to make sure it is done if needed.

* That can’t exclude shell characters and doesn’t munge input to escape them. 

The trick is to not use system() which calls through the shell, but the array variant of spawn() or popen() that doesn't go through a shell. Not only will that be much better performing due to fewer binaries being loaded, but also eliminates all shell parsing from the call chain. Having this wrapped up in the API as the primary mechanism to call CLI commands is intended to help people with that, instead of having them have to find the right ruby incantation themselves.



Cheers, David

Shawn Ferry

unread,
Feb 28, 2017, 11:35:03 AM2/28/17
to puppe...@googlegroups.com




--
Shawn Ferry
At what point does Puppet::Util::Execution.execute change behavior? 

RTFM 4.x is an acceptable answer I only just got 4.7 integrated into our environment but glancing at the code doesn't seem to do anything other than Kernel.exec 








Cheers, David

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CALF7fHbvi5wbFxuARSQYadwUD25-yN7VN8mksm3Hqm%2Bj98AUnw%40mail.gmail.com.

John Bollinger

unread,
Feb 28, 2017, 12:07:16 PM2/28/17
to Puppet Developers


On Monday, February 27, 2017 at 8:59:46 AM UTC-6, David Schmitt wrote:
Hi folks,

I've written down the design in README format now. This should make for a easier to ingest from a module developer's perspective. Meanwhile I've also had a shot at a hardcoded implementation of this, and it is technically possible to implement this too.


Thanks for your efforts on this documentation.  It helps clarify your view.  I think you have some good ideas here, but I also think that what you've presented, as you've presented it, is not a viable replacement for the type & provider system we have now.  Moreover, there remain important areas that I would dearly love to see documented, and one or two longtime sore spots that this proposal seems to be missing the opportunity to address.

Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.

Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?

Similarly, are the "utilities" provided by the runtime environment supposed to be the only mechanism by which implementations interact with the host Ruby environment?  The only mechanism by which implementations interact with the target system at all?

Pushing new responsibilities onto resource implementations

The minimalistic approach to defining the interface for resource implementations has the inevitable effect of making the host environment incapable of some of the tasks that it presently performs, leaving the onus on resource implementations to fill the gaps.  For example, since there is no way for the environment to retrieve the state of a specified individual resource, and because some resource types cannot feasibly be enumerated, the environment cannot, in general, determine whether resources are in sync.  Only resource implementations can be relied upon to do that, and then only in the context of their put() methods.

Similarly, although I definitely like that a logging utility is made available to resource implementations, it seems that most responsibility for logging now falls on resource implementations as well.  Currently, the environment handles a fair amount of that.  It seems likely that moving the responsibility to resource implementations will not only make them more complicated to write, but will also reduce the consistency of log messages.

Terminology Changes

The proposal seems to gratuitously change established terminology.  What we now call "properties" are to be referred to as "attributes", which itself is currently used in a broader sense.  "Types" become "definitions"; "providers" become "implementations".  Why?  However well or poorly you like the established terminology, it is established.  The proposal seems to be offering variations on these existing things, not entirely new things, and changing all the terminology is likely to be more harmful than helpful.

Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

Transaction progress and state

Resource implementations are afforded only indirect information about transaction context.  For example, the docs say that "The implementation should not try to cache state beyond the transaction", but how, exactly, is the implementation supposed to recognize the end of a transaction?  It would be highly desirable for implementations to have means to at least opt in to messages about transaction lifecycle and events.  For some purposes, it would be useful to have a way for type implementations to be able to communicate with each other within the scope of a transaction.

Details

"Read-only" vs. "init-only"

I understand -- I think -- the distinction you're drawing here, but it's confusing.  Surely if a resource does not yet exist, then all its attributes are logically written when it is created / initialized.  I could understand that is a plausible exception to read-onlyness if not for the explicit provision for init-only attributes.  If you want to maintain both of these cases among the attribute metadata then please rename one of them.  For example, perhaps "derived" or "internal" would describe what you mean by "read-only".

Compound resource names

I suppose that compound namevars are a topic that you hoped to avoid in your push for simplification, though it looks like they might not be hard to support via the interface you have already described.  This is a weak area for the present system, and one that I think any replacement should support well and and fully.

Attribute / parameter clashes

By specifying attributes and operational parameters via separate hashes, the proposed system affords the opportunity for name collisions between these.  That also makes resource definitions less cohesive with Puppet overall, because in the Puppet language, attributes and operational parameters are specified together, in the same syntactic construct.  I fully understand the differences between these, but I suggest they not be reflected by a complete separation between their specifications, such as the proposal currently provides.

Resource implementations that become usable during a run

It is only relatively recent that Puppet implemented the long-desired feature of supporting resource providers that become usable as a result of other resources being managed earlier in the same run.  It is not clear whether or how that would be supported under the proposal as currently presented, but this is a feature that I do not think we want to lose.

Environment for commands

If resource implementations are expected to use the 'commands' utility for running system commands, then that utility needs to provide a mechanism for specifying environment variables that must be set in the environment of those commands.  It is furthermore highly desirable that it be possible for the values and maybe even the names of those variables to be drawn from the values of a resource's operational parameters.

Host object

The doc refers in a couple of places to a "host object".  What is this?  What are its methods and properties?  How is it to be used?

Overall

I can see that a lot of thought and effort has gone into this proposal.  I've focused mostly on criticisms, but I want to acknowledge that there's some good work here.

Nevertheless, I regret to say that I dislike the overall proposal in its current form.  It could be improved to some extent by more complete documentation, but documentation notwithstanding, I think it focuses too much on simplification and on the resource itself, and not enough on how resources will be used by and interact with the larger system.  I do not doubt that the proposed API can be implemented, and that's indeed a mark in its favor.  If it is intended only as a simpler alternative to the present system, then many of my more significant concerns are irrelevant.  To the extent that this is being positioned to ultimately replace the current system, however, it doesn't provide everything it needs to do, it misses providing some things it should do, and I'm dissatisfied with the way it provides some of the things it does.


John

David Schmitt

unread,
Feb 28, 2017, 12:54:46 PM2/28/17
to puppe...@googlegroups.com
On 28 February 2017 at 16:34, Shawn Ferry <shawn...@oracle.com> wrote:




--
Shawn Ferry
On Feb 28, 2017, at 10:59, David Schmitt <david....@puppet.com> wrote:



On 27 February 2017 at 18:57, Shawn Ferry <shawn...@oracle.com> wrote:

On Feb 27, 2017, at 9:59 AM, David Schmitt <david....@puppet.com> wrote:

Commands

To use CLI commands in a safe and comfortable manner, the implementation can use the commands method to access shell commands. You can either use a full path, or a bare command name. In the latter case puppet will use the system PATH setting to search for the command. If the commands are not available, an error will be raised and the resources will fail in this run. The commands are aware of whether noop is in effect or not, and will skip the actual execution if necessary. 

Puppet::ResourceImplementation.register('apt_key') do
  commands apt_key: '/usr/bin/apt-key'
  commands gpg: 'gpg'

This will create methods called apt_get, and gpg, which will take CLI arguments without any escaping, and run them in a safe environment (clean working directory, clean environment). For example to call apt-key to delete a specific key by id:

If the goal here is safe execution shouldn’t it escape at least shell special characters? My concern is based on some of the providers I work on which take free form strings as arguments and any type that doesn’t fully validate or munge inputs. 

There is a risk that any provider* today can suffer from shell command injection. It seems that using shelljoin on the argument array would be useful or shellesacpe on each element. However, for a simplified API it seems that it could be done more centrally here instead of pushing it to each implementation to make sure it is done if needed.

* That can’t exclude shell characters and doesn’t munge input to escape them. 

The trick is to not use system() which calls through the shell, but the array variant of spawn() or popen() that doesn't go through a shell. Not only will that be much better performing due to fewer binaries being loaded, but also eliminates all shell parsing from the call chain. Having this wrapped up in the API as the primary mechanism to call CLI commands is intended to help people with that, instead of having them have to find the right ruby incantation themselves.

At what point does Puppet::Util::Execution.execute change behavior? 

RTFM 4.x is an acceptable answer I only just got 4.7 integrated into our environment but glancing at the code doesn't seem to do anything other than Kernel.exec 


As an existing API, we won't be able to change that quickly. Such a change is also not really in scope for the resource API proposal. I would recommend against using any Puppet::* APIs in new providers, to reduce dependencies to the main puppet code. That's why  the `commands` DSL has made it into the new proposal, to provide a more "local" way to call commands.

Cheers, David

Erik Dalén

unread,
Feb 28, 2017, 2:35:50 PM2/28/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

Shawn Ferry

unread,
Feb 28, 2017, 4:31:31 PM2/28/17
to puppe...@googlegroups.com
I’m not calling Puppet::Util::Execution.execute directly. Using defined commands results in a Kernel.exec call.

commands echo: “echo”
echo(“This does bad things;echo this does bad things | wall")

It’s something I’ve been thinking about making a pull for but I think the scope of the change is broad enough that the testing more complicated than I can commit to. I also think it’s a desirable or likely used behavior for Exec calls where it should probably still be allowed. 

In any case I’m stuck on the single point when it’s really a broader issue. My main point is that it command input could change before being sent onto the existing API for execution here so all new development under this implementation can take advantage of it by default.


Thanks
Shawn


Cheers, David

-- 
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

Trevor Vaughan

unread,
Mar 1, 2017, 8:46:24 AM3/1/17
to puppe...@googlegroups.com
As always, if I wait long enough, John will more eloquently provide the response that I wanted to!

I've added a few responses inline


Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.
 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?


This is definitely true. Gary's blog has been the seminal resource for figuring out how to write types and providers and, instead of a new interface, it might be good to finish the original docs. That said, I think I remember David saying that part of the intent of this is to provide a language-agnostic interface to managing the system so that users aren't tightly bound to a particular implementation. I think that this is a good goal *but* I would maintain compatibility with existing Type and Provider implementations.

I've also put in requests for the ability to maintain global state across the catalog compilation. If the brunt of the work is moved to the client, this may be more feasible but I think that it's perfectly possible in the current system with a namespaced caching system.

Also, I would like to see the resource utilization on the clients made better and maybe this can be addressed during the update.

 
Pushing new responsibilities onto resource implementations

The minimalistic approach to defining the interface for resource implementations has the inevitable effect of making the host environment incapable of some of the tasks that it presently performs, leaving the onus on resource implementations to fill the gaps.  For example, since there is no way for the environment to retrieve the state of a specified individual resource, and because some resource types cannot feasibly be enumerated, the environment cannot, in general, determine whether resources are in sync.  Only resource implementations can be relied upon to do that, and then only in the context of their put() methods.

I'm assuming (possibly incorrectly) that there would be a standard Ruby back end example provided. I'll also throw down the gauntlet and say that it should be the File resource. Being one of the oldest, and most janky, native types, I feel that it should be the seminal example. If you do 'easy' ones, I'll look at this askance, if you tackle (and improve) the core types, I'll have something solid to reference.
 

Similarly, although I definitely like that a logging utility is made available to resource implementations, it seems that most responsibility for logging now falls on resource implementations as well.  Currently, the environment handles a fair amount of that.  It seems likely that moving the responsibility to resource implementations will not only make them more complicated to write, but will also reduce the consistency of log messages.

Seems like a reasonable request for a stable exposed logging API
 

Terminology Changes

The proposal seems to gratuitously change established terminology.  What we now call "properties" are to be referred to as "attributes", which itself is currently used in a broader sense.  "Types" become "definitions"; "providers" become "implementations".  Why?  However well or poorly you like the established terminology, it is established.  The proposal seems to be offering variations on these existing things, not entirely new things, and changing all the terminology is likely to be more harmful than helpful.

I've got to say that I didn't notice this until you mentioned it but, now that you have, I'm also scratching my head. The current names make sense and are documented. I do however, think that the new names are more accessible to new users.
 

Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

This was one of my biggest worries early on and it has been solidified with the example code. The Type system simply isn't suitable for complex validation at this point. I need logic (sometimes stupid amounts of logic) and, until I can do arbitrary validation with the Type system, I can't use it for my parameters.

Also, this is starting to look suspiciously like the MCollective DDL, which doesn't bother me but I find amusing.
 

Transaction progress and state

Resource implementations are afforded only indirect information about transaction context.  For example, the docs say that "The implementation should not try to cache state beyond the transaction", but how, exactly, is the implementation supposed to recognize the end of a transaction?  It would be highly desirable for implementations to have means to at least opt in to messages about transaction lifecycle and events.  For some purposes, it would be useful to have a way for type implementations to be able to communicate with each other within the scope of a transaction.

This is absolutely necessary! As much as we would like to keep things in nice little boxes, they have interactions and we *need* those interactions. Resources should know the following:

1) Where they are in the catalog compile stage and what they can access
2) Where they are in the list of similar resource types and what they can access
3) Where they are in the graph and what they can access

As we've grafted this ability into Puppet, we've gained the ability to do thing like native 'concat' as well as rolling up resources to be able to fail unless all resource data is properly defined. I very strongly believe that a floating data type that is catalog accessible would be a good fundamental addition to the Puppet language.
 

Details

"Read-only" vs. "init-only"

I understand -- I think -- the distinction you're drawing here, but it's confusing.  Surely if a resource does not yet exist, then all its attributes are logically written when it is created / initialized.  I could understand that is a plausible exception to read-onlyness if not for the explicit provision for init-only attributes.  If you want to maintain both of these cases among the attribute metadata then please rename one of them.  For example, perhaps "derived" or "internal" would describe what you mean by "read-only".

Compound resource names

I suppose that compound namevars are a topic that you hoped to avoid in your push for simplification, though it looks like they might not be hard to support via the interface you have already described.  This is a weak area for the present system, and one that I think any replacement should support well and and fully.

Fundamentally, Puppet uses relational database concepts in the basic resource model and, if thought about as such, can use the usual tricks to support uniqueness using multi-key resources.
 

Attribute / parameter clashes

By specifying attributes and operational parameters via separate hashes, the proposed system affords the opportunity for name collisions between these.  That also makes resource definitions less cohesive with Puppet overall, because in the Puppet language, attributes and operational parameters are specified together, in the same syntactic construct.  I fully understand the differences between these, but I suggest they not be reflected by a complete separation between their specifications, such as the proposal currently provides.

I think I'm OK with this provided that the validator flags any name collisions as a compile failure similar to what properties and parameters do now.
 

Resource implementations that become usable during a run

It is only relatively recent that Puppet implemented the long-desired feature of supporting resource providers that become usable as a result of other resources being managed earlier in the same run.  It is not clear whether or how that would be supported under the proposal as currently presented, but this is a feature that I do not think we want to lose.

Ditto. I really need this capability to remain.
 

Trevor Vaughan

unread,
Mar 1, 2017, 9:00:45 AM3/1/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
Additional comments in response to Erik

On Tue, Feb 28, 2017 at 2:35 PM, Erik Dalén <erik.gus...@gmail.com> wrote:
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

So, this is a fundamental failure of the File resource being too tightly coupled to the Object model of Puppet. We could, relatively easily, get around this but it would require working around the Object model to provide point optimization.

I'm starting to notice that a lot of people are now passing around hashes to prevent spawning a lot of resources and I think that this fundamental issue should be addressed prior to the implementation of another Type/Provider subsystem. It is possible that another system will *need* to be provided to fully address these issues and that would be a change that I could wholeheartedly get behind.
 

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

Yes, this is very much needed. The ability that the community has basically shoved into Puppet in regards to catalog delving is one of the things that helps Puppet remain relevant across various environments.
 

It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

I've had to have trailing resource collectors (as does concat) that simply exist to graft together a bunch of previously declared resources. This really needs to be able to be eliminated and we need to be able to mark resources as irrelevant to the client so that the data, in specific, can not bloat the catalog. This *should* be relatively straightforward with an extension to the existing object model but I haven't had time to try it.
 

logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

Agreed
 

Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

I still think that types just aren't enough at this point without arbitrary validation capabilities, and the ability to not repeat my type definitions all over the place.
 

Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

100% need this. If I can't do this, most of the advanced functionality that I have dies. The catalog is one of the shining points in Puppet's favor in terms of monitoring and compliance. I *need* the ability to dig into it as necessary and I would love to see that ability codified instead of having to re-hack my way into it when it changes. Also, this is something that I need at run time. Waiting until it's stuffed in a random DB somewhere is too late.

From the top of my head, I use this for: Concat, IPTables, CGroups, Kerberos, and Compliance Validation

Losing this would be a *BIG DEAL*
 

Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

I also use this quite a bit. I know that it's not as easy to debug but it is occasionally necessary to maintain the proper object model and not resort to hacks that may conflict with other resource declarations.
 

Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?

I'm honestly not so concerned about this. I used to be all for running my catalogs in parallel but I quickly realized that the slight speed gains that I might get (I'll probably be blocked by I/O somewhere anyway) would probably not be worth the headache of trying to debug a parallel catalog gone bad. In my mind, parallelism in Puppet should be thought of at the cross-node level and the catalogs should moved to a direct heap/stack implementation to reduce resource usage on the clients and to gain efficiency.
 

To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CAAAzDLcHRk0crAwcXHsP-2QS6Bv1OVHGxQV8_-QoOUnf%2BDvoQg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Erik Dalén

unread,
Mar 1, 2017, 9:12:22 AM3/1/17
to puppe...@googlegroups.com
On Wed, 1 Mar 2017 at 14:46 Trevor Vaughan <tvau...@onyxpoint.com> wrote:
As always, if I wait long enough, John will more eloquently provide the response that I wanted to!

I've added a few responses inline


Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.

Although in some cases it is a bit annoying, for example not being able to install a gem and deb with the same name on a host. This specific case was fixed in https://tickets.puppetlabs.com/browse/PUP-1073 but still exists for other types than package.
In many cases the providers actually manage different things, not mutually exclusive variants of the same thing.
But having deb_package and rpm_package would definitely just be annoying if you wanted a module that works across multiple distros. Putting both providers into the same implementation file would be a bit ugly, but could work I guess.
 
 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?


This is definitely true. Gary's blog has been the seminal resource for figuring out how to write types and providers and, instead of a new interface, it might be good to finish the original docs. That said, I think I remember David saying that part of the intent of this is to provide a language-agnostic interface to managing the system so that users aren't tightly bound to a particular implementation. I think that this is a good goal *but* I would maintain compatibility with existing Type and Provider implementations.

There's a bunch of things that can't be solved in any good way in the old type and provider system that looks like it would be pretty easy to solve in this new system. For example this old feature request: https://projects.puppetlabs.com/issues/2198 


Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

This was one of my biggest worries early on and it has been solidified with the example code. The Type system simply isn't suitable for complex validation at this point. I need logic (sometimes stupid amounts of logic) and, until I can do arbitrary validation with the Type system, I can't use it for my parameters.

If all attributes were defined in a single type it could handle a lot more situations (although the type signature could get massive). Consider this simplified example for a file type:

Variant[
  Struct[{
    ensure => Enum[link],
    target => String,
 }],
  Struct[{
    ensure => Enum[file],
    content => String,
 }],
  Struct[{
    ensure => Enum[file],
    source => String,
 }],
]

That would only allow the target attribute if ensure is "link", and only allow either content or source if ensure is "file".
Accounting for all possible attributes on the file type would probably span a few pages though :)
 

Trevor Vaughan

unread,
Mar 1, 2017, 11:22:02 AM3/1/17
to puppe...@googlegroups.com
On Wed, Mar 1, 2017 at 9:12 AM, Erik Dalén <erik.gus...@gmail.com> wrote:


On Wed, 1 Mar 2017 at 14:46 Trevor Vaughan <tvau...@onyxpoint.com> wrote:
As always, if I wait long enough, John will more eloquently provide the response that I wanted to!

I've added a few responses inline


Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.

Although in some cases it is a bit annoying, for example not being able to install a gem and deb with the same name on a host. This specific case was fixed in https://tickets.puppetlabs.com/browse/PUP-1073 but still exists for other types than package.
In many cases the providers actually manage different things, not mutually exclusive variants of the same thing.
But having deb_package and rpm_package would definitely just be annoying if you wanted a module that works across multiple distros. Putting both providers into the same implementation file would be a bit ugly, but could work I guess.
 
 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?


This is definitely true. Gary's blog has been the seminal resource for figuring out how to write types and providers and, instead of a new interface, it might be good to finish the original docs. That said, I think I remember David saying that part of the intent of this is to provide a language-agnostic interface to managing the system so that users aren't tightly bound to a particular implementation. I think that this is a good goal *but* I would maintain compatibility with existing Type and Provider implementations.

There's a bunch of things that can't be solved in any good way in the old type and provider system that looks like it would be pretty easy to solve in this new system. For example this old feature request: https://projects.puppetlabs.com/issues/2198 


For that ticket in particular, changing from a strict object model to a more lookup-oriented scenario should fix a lot of things. If a provider is in that does something *different* in theory that should be a different resource. Also, many resources, by nature are simply tied together and special logic will need to be in play to handle those relationships.

So, for that ticket in particular, using the model that is used in Concat, and SIMP IPTables among others, the 'package' resources should be *data* and the culmination of all types of packages for type 'yum, rpm, deb, foo...' should be that the correct command is run in batch mode.

However, this means that if any one of those objects fails, then everything downstream from all of them fails and this is a bad thing. There would just need to be a LOT more back end logic and the ability to fail parts of the resource tree.

I still think that the object model is the problem, not this particular interface to it.
 


Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

This was one of my biggest worries early on and it has been solidified with the example code. The Type system simply isn't suitable for complex validation at this point. I need logic (sometimes stupid amounts of logic) and, until I can do arbitrary validation with the Type system, I can't use it for my parameters.

If all attributes were defined in a single type it could handle a lot more situations (although the type signature could get massive). Consider this simplified example for a file type:

Variant[
  Struct[{
    ensure => Enum[link],
    target => String,
 }],
  Struct[{
    ensure => Enum[file],
    content => String,
 }],
  Struct[{
    ensure => Enum[file],
    source => String,
 }],
]

That would only allow the target attribute if ensure is "link", and only allow either content or source if ensure is "file".
Accounting for all possible attributes on the file type would probably span a few pages though :)

I get it, but take a look down this monstrosity stack and tell me that it wouldn't be WAY easier if I could just call a few Ruby functions https://github.com/simp/pupmod-simp-simplib/tree/master/types/ip
 

David Schmitt

unread,
Mar 2, 2017, 6:15:00 AM3/2/17
to puppe...@googlegroups.com
Yup. The new API will only accept array input.


Cheers, David
 


Thanks
Shawn


Cheers, David

-- 
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/4089D8E5-E2C5-4251-B445-F548C0CB3DD1%40oracle.com.

David Schmitt

unread,
Mar 2, 2017, 8:29:08 AM3/2/17
to puppe...@googlegroups.com
On 28 February 2017 at 17:07, John Bollinger <John.Bo...@stjude.org> wrote:


On Monday, February 27, 2017 at 8:59:46 AM UTC-6, David Schmitt wrote:
Hi folks,

I've written down the design in README format now. This should make for a easier to ingest from a module developer's perspective. Meanwhile I've also had a shot at a hardcoded implementation of this, and it is technically possible to implement this too.


Thanks for your efforts on this documentation.  It helps clarify your view.  I think you have some good ideas here, but I also think that what you've presented, as you've presented it, is not a viable replacement for the type & provider system we have now.  Moreover, there remain important areas that I would dearly love to see documented, and one or two longtime sore spots that this proposal seems to be missing the opportunity to address.

This is only a v1. It is presented with the intent to build a simpler, more approachable foundation for future work.
 
Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.
 
Looking at which types provide multiple implementations, I mostly found types from core puppet. Even looking at those (e.g. package, user, service), they have many attributes that are provider specific, making the abstractions very leaky. Other examples in the wider software world  (e.g. SQL mappers, vendor-independent configuration schemes) also highlight the issues with one-size-fits-all approaches: to get the most out of a specific system, users require full access to the lower layer's capabilities. Having said that, it does not invalidate the use-cases where a broad applicability trumps those special cases. Puppet already has a perfectly reasonable DSL to deal with those cases. For example, a hypothetical successor to the package type could look like this:

define package (
  Ensure $ensure,
  Enum[apt, rpm] $provider, # have a hiera 5 dynamic binding to a function choosing a sensible default for the current system
  Optional[String] $source  = undef,
  Optional[String] $version = undef,
  Optional[Hash] $options   = { },
) {
  case $provider {
    apt: {
      package_apt { $title:
        ensure          => $ensure,
        source          => $source,
        version         => $version,
        *               => $options,
      }
    }
    rpm: {
      package_rpm { $title:
        ensure => $ensure,
        source => $source,
        *      => $options,
      }
      if defined($version) { fail("RPM doesn't support \$version") }
      # ...
    }
  }
}


This would provide a best of both worlds approach that exposes the subtleties of particular systems, and the full use of those features, without overloading a common interface, while still providing a quick and simple top-level entry point, if you don't need that level of detail.

 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.

Thank you for this praise, seeing as I'm up against a team of world-class tech writers, and legends like Dan Bode :-D
 
  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?

Very good questions all of them. I haven't gotten into those details yet. At a high level I'd expect the data passed around from get() and to set() will be passed "by-value." That is, no references that are handed out be get() will be valid within set(), and nothing is depending on a specific behaviour of the implementation around those references. You are completely right in that these details need to be written out in the docs.
 

Similarly, are the "utilities" provided by the runtime environment supposed to be the only mechanism by which implementations interact with the host Ruby environment?

The "host Ruby environment" is a large scope. For example interaction with the current catalog during evaluation is definitely on the list of things I currently don't want to address (although there are good reasons for some to do so, e.g. concat). On the other hand, I don't want to keep a implementation from using ruby's general capabilities. Do you have specific things in mind?
 
  The only mechanism by which implementations interact with the target system at all?

Not at all. Implementations are free to use ruby in any way they see fit. There is no value in e.g. prescribing a way for talking to an HTTP/REST API. Looking at the most interesting examples (AWS, azure, etc) those projects already bring their own SDKs, so there is no way anyways to improve on that.


Pushing new responsibilities onto resource implementations

The minimalistic approach to defining the interface for resource implementations has the inevitable effect of making the host environment incapable of some of the tasks that it presently performs, leaving the onus on resource implementations to fill the gaps.  For example, since there is no way for the environment to retrieve the state of a specified individual resource, and because some resource types cannot feasibly be enumerated, the environment cannot, in general, determine whether resources are in sync.  Only resource implementations can be relied upon to do that, and then only in the context of their put() methods.

Yes. This is a conscious decision to keep the v1 simple. Evolving the interface later by adding more capabilities is easier than fixing a rushed design. For many resources getting all of them is as costly as getting a single one, e.g. the apt-key thing.


Similarly, although I definitely like that a logging utility is made available to resource implementations, it seems that most responsibility for logging now falls on resource implementations as well.  Currently, the environment handles a fair amount of that.  It seems likely that moving the responsibility to resource implementations will not only make them more complicated to write, but will also reduce the consistency of log messages.

The current reporting from the puppet agent is entirely aspirational. Log messages are emitted when values are passed to the provider, not when changes are effected. As the new resources will have to run alongside existing resources for a long while, this will not go away. Logging from the implementation will allow more accurate reporting about what actually happens. I can totally see simple implementations skipping logging altogether, although I would not recommend that. To support people new to provider development, the API beyond the basic messaging functions is intended to provide guidance on how to log, especially around error handling.
 

Terminology Changes

The proposal seems to gratuitously change established terminology.  What we now call "properties" are to be referred to as "attributes", which itself is currently used in a broader sense.  "Types" become "definitions"; "providers" become "implementations".  Why?  However well or poorly you like the established terminology, it is established.  The proposal seems to be offering variations on these existing things, not entirely new things, and changing all the terminology is likely to be more harmful than helpful.

Partially, this was because those are not the same things. Partially to get a feel of people's reactions. Your's speaks clearly.
 

Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

Full validation cannot be performed during catalog building. Accepting that, the conversation turns to where to put the different pieces. One of the design constraints is to minimize ruby code on the master for environment isolation. That said, multiple-attribute, multiple-parameter covalidation definitely is on my list of things I want to support server-side in the future. One possibility would be to lift the definition part into the puppet DSL, and do validation there. If we get all the infrastructure for that in place it'll be an automatable upgrade from the definitions described here, and it will require even more tooling for backwards compatibility. I'm not comfortable to delay this effort further for features that can be seamlessly retrofitted later.
 

Transaction progress and state

Resource implementations are afforded only indirect information about transaction context.  For example, the docs say that "The implementation should not try to cache state beyond the transaction", but how, exactly, is the implementation supposed to recognize the end of a transaction?
  It would be highly desirable for implementations to have means to at least opt in to messages about transaction lifecycle and events. 


This has come up now multiple times, and I'll add a optional end-of-transaction lifecycle method to the implementation. I'll need to figure out how to hook that up to puppet though. Any suggestions on naming?
 

 
For some purposes, it would be useful to have a way for type implementations to be able to communicate with each other within the scope of a transaction.

Cooperating implementations can use regular ruby mechanisms, like singletons in a ruby module or class, to communicate. In a polyglot world where some or all providers are running outside the main agent process, that'll cease to be a viable way to solve these problems. Maybe it'll turn out that we need to keep the old API around indefinitely, or maybe we'll find better ways to solve these problems. For example, native concat could very well be implemented as a catalog transformation, that runs on the server, after compilation, collects all fragments and *replaces* them in the catalog with the final file. The fact that all these nifty things currently run as types and providers does not prevent us from exploring better solutions for the problems they are solving. And the old APIs will remain in place for quite a while.
 
Details

"Read-only" vs. "init-only"

I understand -- I think -- the distinction you're drawing here, but it's confusing.  Surely if a resource does not yet exist, then all its attributes are logically written when it is created / initialized.  I could understand that is a plausible exception to read-onlyness if not for the explicit provision for init-only attributes.  If you want to maintain both of these cases among the attribute metadata then please rename one of them.  For example, perhaps "derived" or "internal" would describe what you mean by "read-only".

I've added more explanation around read_only and init_only:

  * `init_only`: this attribute can only be set during creation of the resource. Its value will be reported going forward, but trying to change it later will lead to an error. For example, the base image for a VM, or the UID of a user.
  * `read_only`: values for this attribute will be returned by `get()`, but `set()` is not able to change them. Values for this should never be specified in a manifest. For example the checksum of a file, or the MAC address of a network interface.

I hope this makes the distinction clearer.


Compound resource names

I suppose that compound namevars are a topic that you hoped to avoid in your push for simplification, though it looks like they might not be hard to support via the interface you have already described.  This is a weak area for the present system, and one that I think any replacement should support well and and fully.

Composite namevars are a topic dear to my heart (see https://groups.google.com/d/msg/puppet-dev/xQN5nTHrIc0/bUplrjQj7sAJ for historical context, https://github.com/puppetlabs/puppet-specifications/blob/master/language/resource_types.md#title-patterns for the current (incomplete) spec). Exactly the fact that they'll be easy to support, is the reason why they're not in the v1. I've started a "Known Limitations" section in the README to collect all these points and arguments. Also provides a good starting point for anyone who wants to get involved.
 

Attribute / parameter clashes

By specifying attributes and operational parameters via separate hashes, the proposed system affords the opportunity for name collisions between these.  That also makes resource definitions less cohesive with Puppet overall, because in the Puppet language, attributes and operational parameters are specified together, in the same syntactic construct.  I fully understand the differences between these, but I suggest they not be reflected by a complete separation between their specifications, such as the proposal currently provides.

Excellent point, I didn't think of the uniqueness requirements here. I'll fold them into a single structure.
 

Resource implementations that become usable during a run

It is only relatively recent that Puppet implemented the long-desired feature of supporting resource providers that become usable as a result of other resources being managed earlier in the same run.  It is not clear whether or how that would be supported under the proposal as currently presented, but this is a feature that I do not think we want to lose.

To quote from the readme:


To allow for a transaction to set up the prerequisites for an implementation, and use it immediately, the provider is instantiated as late as possible.

Together with autoloading, this should address the situation. It might be necessary to put `require` statements into the body of the registered block, instead of the beginning of the file.


Environment for commands

If resource implementations are expected to use the 'commands' utility for running system commands, then that utility needs to provide a mechanism for specifying environment variables that must be set in the environment of those commands.

Good call on the environment support. I've added it like this:

To pass additional environment variables through to the command, pass a hash
of them as `env:`:

```ruby
apt_key 'del', key_id, env: { 'LC_ALL': 'C' }
```
 
  It is furthermore highly desirable that it be possible for the values and maybe even the names of those variables to be drawn from the values of a resource's operational parameters.

The ruby namespace for those methods is shared between all resource instances. An alternative would be a way to access the commands functionality without having to pre-specify the command name, e.g.:

`run_command '/usr/bin/apt-key', 'del', key_id`

That can be used with the get/set methods and access dynamic values.

 

Host object

The doc refers in a couple of places to a "host object".  What is this?  What are its methods and properties?  How is it to be used?

The ruby object hosting the get and set methods. I've changed the wording to make that clearer: "The object instance hosting the `get` and `set` methods can be used to cache ephemeral state during..."
 

Overall

I can see that a lot of thought and effort has gone into this proposal.  I've focused mostly on criticisms, but I want to acknowledge that there's some good work here.

Nevertheless, I regret to say that I dislike the overall proposal in its current form.  It could be improved to some extent by more complete documentation, but documentation notwithstanding, I think it focuses too much on simplification and on the resource itself, and not enough on how resources will be used by and interact with the larger system.  I do not doubt that the proposed API can be implemented, and that's indeed a mark in its favor.  If it is intended only as a simpler alternative to the present system, then many of my more significant concerns are irrelevant.  To the extent that this is being positioned to ultimately replace the current system, however, it doesn't provide everything it needs to do, it misses providing some things it should do, and I'm dissatisfied with the way it provides some of the things it does.

This API is not a full replacement for the power of 3.x style types and providers. In the end, the goal of the new Resource API is not to be a complete replacement of prior art, but a cleaner way to get good results for the majority of simple cases.



Thank you for the time and work you're investing in reviewing this! It already has made the thing better.


Regards, David

David Schmitt

unread,
Mar 2, 2017, 8:56:29 AM3/2/17
to puppe...@googlegroups.com
On 1 March 2017 at 13:46, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
As always, if I wait long enough, John will more eloquently provide the response that I wanted to!

I've added a few responses inline


Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.

See my answer to John's point.
 
 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?


This is definitely true. Gary's blog has been the seminal resource for figuring out how to write types and providers and, instead of a new interface, it might be good to finish the original docs. That said, I think I remember David saying that part of the intent of this is to provide a language-agnostic interface to managing the system so that users aren't tightly bound to a particular implementation. I think that this is a good goal *but* I would maintain compatibility with existing Type and Provider implementations.

Existing types&providers will not go away for a long time. My prototype translates attributes into newproperty calls and shoves the get&set methods into a 3.x type. For everyone else in the agent or server this will look and behave exactly like a 3.x type.
 

I've also put in requests for the ability to maintain global state across the catalog compilation. If the brunt of the work is moved to the client, this may be more feasible but I think that it's perfectly possible in the current system with a namespaced caching system.

I've discussed this, and possible solutions in my answer to John.
 

Also, I would like to see the resource utilization on the clients made better and maybe this can be addressed during the update.

This interface provides the possibility for batching. The current agent runtime obviously cannot take advantage of that yet, but it is a chicken and egg issue. Getting this going would be a huge payoff.

Pushing new responsibilities onto resource implementations

The minimalistic approach to defining the interface for resource implementations has the inevitable effect of making the host environment incapable of some of the tasks that it presently performs, leaving the onus on resource implementations to fill the gaps.  For example, since there is no way for the environment to retrieve the state of a specified individual resource, and because some resource types cannot feasibly be enumerated, the environment cannot, in general, determine whether resources are in sync.  Only resource implementations can be relied upon to do that, and then only in the context of their put() methods.

I'm assuming (possibly incorrectly) that there would be a standard Ruby back end example provided. I'll also throw down the gauntlet and say that it should be the File resource. Being one of the oldest, and most janky, native types, I feel that it should be the seminal example. If you do 'easy' ones, I'll look at this askance, if you tackle (and improve) the core types, I'll have something solid to reference.

The v1 is not intended to solve all the nastiness that is required for the implementation of file. There are many use-cases that will never need that level of functionality. I'm also convinced that the current level of *complexity* in types and providers is not required for that level of functionality.
 
 

Similarly, although I definitely like that a logging utility is made available to resource implementations, it seems that most responsibility for logging now falls on resource implementations as well.  Currently, the environment handles a fair amount of that.  It seems likely that moving the responsibility to resource implementations will not only make them more complicated to write, but will also reduce the consistency of log messages.

Seems like a reasonable request for a stable exposed logging API

?
 
 

Terminology Changes

The proposal seems to gratuitously change established terminology.  What we now call "properties" are to be referred to as "attributes", which itself is currently used in a broader sense.  "Types" become "definitions"; "providers" become "implementations".  Why?  However well or poorly you like the established terminology, it is established.  The proposal seems to be offering variations on these existing things, not entirely new things, and changing all the terminology is likely to be more harmful than helpful.

I've got to say that I didn't notice this until you mentioned it but, now that you have, I'm also scratching my head. The current names make sense and are documented. I do however, think that the new names are more accessible to new users.

Thanks.
 
 

Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

This was one of my biggest worries early on and it has been solidified with the example code. The Type system simply isn't suitable for complex validation at this point. I need logic (sometimes stupid amounts of logic) and, until I can do arbitrary validation with the Type system, I can't use it for my parameters.

Does that validation have to be server side?
 
Also, this is starting to look suspiciously like the MCollective DDL, which doesn't bother me but I find amusing.

There is only so much you can do to a type system :-D
 
Transaction progress and state

Resource implementations are afforded only indirect information about transaction context.  For example, the docs say that "The implementation should not try to cache state beyond the transaction", but how, exactly, is the implementation supposed to recognize the end of a transaction?  It would be highly desirable for implementations to have means to at least opt in to messages about transaction lifecycle and events.  For some purposes, it would be useful to have a way for type implementations to be able to communicate with each other within the scope of a transaction.

This is absolutely necessary! As much as we would like to keep things in nice little boxes, they have interactions and we *need* those interactions. Resources should know the following:

1) Where they are in the catalog compile stage and what they can access
2) Where they are in the list of similar resource types and what they can access
3) Where they are in the graph and what they can access

As we've grafted this ability into Puppet, we've gained the ability to do thing like native 'concat' as well as rolling up resources to be able to fail unless all resource data is properly defined. I very strongly believe that a floating data type that is catalog accessible would be a good fundamental addition to the Puppet language.

See my response to John.
 
 

Details

"Read-only" vs. "init-only"

I understand -- I think -- the distinction you're drawing here, but it's confusing.  Surely if a resource does not yet exist, then all its attributes are logically written when it is created / initialized.  I could understand that is a plausible exception to read-onlyness if not for the explicit provision for init-only attributes.  If you want to maintain both of these cases among the attribute metadata then please rename one of them.  For example, perhaps "derived" or "internal" would describe what you mean by "read-only".

Compound resource names

I suppose that compound namevars are a topic that you hoped to avoid in your push for simplification, though it looks like they might not be hard to support via the interface you have already described.  This is a weak area for the present system, and one that I think any replacement should support well and and fully.

Fundamentally, Puppet uses relational database concepts in the basic resource model and, if thought about as such, can use the usual tricks to support uniqueness using multi-key resources.

See my response to John.

 

Attribute / parameter clashes

By specifying attributes and operational parameters via separate hashes, the proposed system affords the opportunity for name collisions between these.  That also makes resource definitions less cohesive with Puppet overall, because in the Puppet language, attributes and operational parameters are specified together, in the same syntactic construct.  I fully understand the differences between these, but I suggest they not be reflected by a complete separation between their specifications, such as the proposal currently provides.

I think I'm OK with this provided that the validator flags any name collisions as a compile failure similar to what properties and parameters do now.

 
 

Resource implementations that become usable during a run

It is only relatively recent that Puppet implemented the long-desired feature of supporting resource providers that become usable as a result of other resources being managed earlier in the same run.  It is not clear whether or how that would be supported under the proposal as currently presented, but this is a feature that I do not think we want to lose.

Ditto. I really need this capability to remain.

yup.

Thanks for your time and work spent reviewing this.

Cheers, David

David Schmitt

unread,
Mar 2, 2017, 9:29:23 AM3/2/17
to puppe...@googlegroups.com
On 1 March 2017 at 14:12, Erik Dalén <erik.gus...@gmail.com> wrote:


On Wed, 1 Mar 2017 at 14:46 Trevor Vaughan <tvau...@onyxpoint.com> wrote:
As always, if I wait long enough, John will more eloquently provide the response that I wanted to!

I've added a few responses inline


Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.

Although in some cases it is a bit annoying, for example not being able to install a gem and deb with the same name on a host. This specific case was fixed in https://tickets.puppetlabs.com/browse/PUP-1073 but still exists for other types than package.
In many cases the providers actually manage different things, not mutually exclusive variants of the same thing.
But having deb_package and rpm_package would definitely just be annoying if you wanted a module that works across multiple distros. Putting both providers into the same implementation file would be a bit ugly, but could work I guess.
 
 
Inadequate interface documentation

As I said earlier, one of the biggest problems with the current system is inadequate documentation.  As it now stands, the proposal's documentation does about as well as the docs of the current system.  Missing is information about how the runtime environment is intended to use the defined objects and methods -- for example, under what circumstances it will instantiate / invoke them, how many times per transaction (partially addressed in the doc), at what point(s) in the transaction.  What may the environment do with the object returned by get() and its contents (i.e., must implementations assume that the environment may modify them)?  Is there anything the environment is required to do with them?  What may an implementation do with the object passed to set() and its contents?  Is there any expectation about relationships between the objects provided by get() and received by put()?


This is definitely true. Gary's blog has been the seminal resource for figuring out how to write types and providers and, instead of a new interface, it might be good to finish the original docs. That said, I think I remember David saying that part of the intent of this is to provide a language-agnostic interface to managing the system so that users aren't tightly bound to a particular implementation. I think that this is a good goal *but* I would maintain compatibility with existing Type and Provider implementations.

There's a bunch of things that can't be solved in any good way in the old type and provider system that looks like it would be pretty easy to solve in this new system. For example this old feature request: https://projects.puppetlabs.com/issues/2198 

yeah, that is exactly the reason for passing an array of resources into the set() method. It'll require more infrastructure around it, but we have to start somewhere.
 
Specific features

Attribute and parameter validation

Validation ought to be performed during catalog building, thus it needs to be adequately addressed by type "definitions".  I am dissatisfied with the proposal's provision for that.  The Puppet type system is very expressive, but it can also be very arcane.  I maintain that it is a mistake to rely exclusively on the type system for validating "attributes" or "operational parameters".  Moreover, I think the proposal is missing an opportunity to provide for multiple-attribute, multiple-parameter covalidation.  This has been requested on and off for a long time, and this proposal is a great opportunity to finally put it in place.

This was one of my biggest worries early on and it has been solidified with the example code. The Type system simply isn't suitable for complex validation at this point. I need logic (sometimes stupid amounts of logic) and, until I can do arbitrary validation with the Type system, I can't use it for my parameters.

If all attributes were defined in a single type it could handle a lot more situations (although the type signature could get massive). Consider this simplified example for a file type:

Variant[
  Struct[{
    ensure => Enum[link],
    target => String,
 }],
  Struct[{
    ensure => Enum[file],
    content => String,
 }],
  Struct[{
    ensure => Enum[file],
    source => String,
 }],
]

That would only allow the target attribute if ensure is "link", and only allow either content or source if ensure is "file".
Accounting for all possible attributes on the file type would probably span a few pages though :)

Technically, a solution. :-D



Regards, David




David Schmitt

unread,
Mar 7, 2017, 2:38:26 PM3/7/17
to puppe...@googlegroups.com, Puppet Users, voxp...@groups.io
[aologies, this got lost in my Drafts folder.]

On 28 February 2017 at 19:35, Erik Dalén <erik.gus...@gmail.com> wrote:
Overall I think this looks pretty good, but I have some questions and comments.

For implementations that can't enumerate all existing resources (at least not in any fast way), like for example File, should they fine a get method that returns an empty hash, nil or just not define a get method? nil or not defining would allow special handling of them, so that would probably be preferable.

But shouldn't it really have a way to fetch the state of a single resource in for those implementations? How would this otherwise handle something like "puppet resource file /path/to/file", traverse the entire file system and stat every file?

Yup it should. I've added a short discussion of the problem to the "Known limitations" section. I could add verbiage around the get() method returning nil (as opposed to []) to signal that there might be instances available, but it's not possible to enumerate them.
 
It seems it would also be handy with some helper method that compared current_state & target_state to filter out the changes that actually need to happen. I would imagine that sort of code would otherwise be duplicated in many implementations.

True. Porting the existing types and providers from our supported modules is high on my list to validate the design, and will expose the patterns. That'll make it much easier to build a library of helpers.
 
logger.attribute_changed could really have a default value for message if the implementation provides attribute, old_value & new_value they shouldn't all need to supply "Changed #{attribute} from #{old_value} to #{new_value}".

That's already specified.
 
Will the attribute type fields be able to use type definitions shipped in the same module or present on in environment?

Currently anywhere "puppet 4 data types" are mentioned, only the built-in types are usable. This is because the type information is required on the agent, but puppet doesn't make it available yet. This work is tracked in [PUP-7197](https://tickets.puppetlabs.com/browse/PUP-7197), but even once that is implemented, modules will have to wait until the functionality is widely available, before being able to rely on that.
 
Will they be able to inspect the catalog or communicate between implementations, like in the concat & concat_fragment case?

 My preferred method to solve that would be a new top-level plugin, that can inspect and change the catalog after it has been built. That plugin could then collect all the fragments, and build a file resource from them, before the catalog is even passed to the agent.


Will they be able to generate resources? That could probably be done with something similar to the logger methods, but there is nothing like that in the proposal. I assume they won't have a generate method.

Adding more discrete functionality and life-cycle methods later seems like an easy extension, that will not require a major API revision. Also possibly another use-case for a catalog transform plugin.
 
Will there be some requirement that these implementations are thread safe so puppet can start managing multiple resources in parallel? Or will they run in some subprocess even?


Theoretically, those implementations could be run in a separate process. Looking at things like Lutterkort's libral, this might happen quicker than I think today. As a firm believer of small incremental steps, I only want to speculate on those things as far as I need to avoid blocking a future change. Therefore I'm focusing this new API mostly on being small, self contained, and decoupled.


Thanks for your time and work in reviewing this!

Regards, David


David Lutterkort

unread,
Mar 8, 2017, 6:55:37 PM3/8/17
to puppe...@googlegroups.com
On Wed, Mar 1, 2017 at 5:46 AM, Trevor Vaughan <tvau...@onyxpoint.com> wrote:

Big picture items

Multiple implementations / implementation selection

In splitting resources into "definition" and "implementation", the proposal adheres to a form similar to the current system's, but it seems to have fundamentally different design goals.  I've always interpreted the present type / provider system's separation of resource interface from implementation first and foremost as a means to accommodate multiple implementations. The one most appropriate to the target system is chosen from among those available.  I think that's a sound design approach; I like it, and it has served Puppet well.  As far as I can determine, however, the proposal loses that completely -- I see no means to support multiple implementations at all, much less means to match an implementation to the target system.


I hadn't actually caught this, but if not just an oversight, it is indeed concerning. The ability to choose from various providers has been a strong play in Puppet's favor.

The main weakness of the current system is that the data type for a resource changes based on the provider that is ultimately used for it. Once you use yum-specific attributes in your package resource, it's only going to work with the yum provider, though that might be hard to tell from just looking at the resource.

I would favor an approach where you can write a resource either directly to the interface the provider actually implements (i.e., package_yum - use all the yum-specific stuff you want) or a generic/abstracted type interface that is defined so that it can be fulfilled by all the providers for it.

To phrase that slightly differently: I've come to look at types and providers as both defining data types for the resources they handle. Providers define a concrete, definite list of the attributes that they handle. Types in that view are an abstraction across multiple providers, and more reminiscent of Java interfaces or Rust traits in that respect.

Do you see issues with that approach ? The question of how much work it should be to actually make the abstracted type interface come into being is important, but somewhat secondary to whether that general approach is useful.

David


Reply all
Reply to author
Forward
0 new messages