RFC - A specification for module schemas

57 views
Skip to first unread message

Corey Osman

unread,
Jan 29, 2016, 11:45:07 PM1/29/16
to Puppet Dev
Hi,

I wanted to bring up a conversation in hopes that we as a community can create a specification for something I am calling module schemas.  Before I get into that I want to provide a little background info.

This all started a few years ago when hiera first came out. Data seperation in the form of parameters and auto hiera lookups quickly became the norm and reusable modules exploded into what the forge is today .  Because of the popularity of hiera, data validation is now a major problem though.  Without good data, excellent modules become useless. 

Puppet 4 and stdlib brought many new functions and ways to validate incoming data, and I consider puppet 4 to now be a loosely typed language now.   Hell, there was even this a long time ago: https://github.com/puppetlabs/puppetlabs-kwalify  But puppet only does so much, and while having validation reside in code might make troubleshooting a snap, there is still a delay in the feedback loop when the code is tightly coupled with an external “database” of data.  Data that is inserted by non puppet developers who don’t know YAML or data structures.  

So with that said I want to introduce something new to puppet module development, called module schemas.  A module schema is a specification that details the inner workings of a module.   For right now this means a detailed specification of all the parameters for classes and definitions used inside a module who’s goal is to make it impossible to insert a bad data structure.  But ideally, we can specify so much more (functions, types, providers, templates) even hiera calls in weird places like templates and functions, which are usually things that do not get documented and are hard to reference and usually requires looking at source code. 

What does such a schema look like?

Here is a example schema for the apache module which contains 446 parameters!.   https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml

The most immediate use case for such a schema is hiera validation as I have outlined here: http://logicminds.github.io/blog/2016/01/16/testing-hiera-data.  Which works AWESOME!.  We are validating hiera data and not YAML and doing it under 500 ms for every commit on every single file. 

As a community we need a solution for validating hiera data.  Its my belief that schemas are the way to go.   After all hiera data is now in modules with no way to easily validate. 

Other use cases that come to mind:

  - generating documentation (Many modules on the forge usually contain a static map of parameters used inside the module).   If a schema was present, we could just generate that same map automatically.
  
  - useful for other 3rd party tools like puppet strings 
  
  Parameter specification lookup
  - Imagine a  face that shows internal puppet module specifications.  I am not talking about puppet-strings, this would detail the parameters given a class, or an example parameter value given a parameter name.
    
    Scenario: 
      - puppet module puppetlabs/apache   (outputs all the parameters, classes for that module) in a specified format (json or yaml)
      - puppet module puppetlabs-apache::class_name (outputs all the parameters for the class in a specified format (json or yaml)
      - puppet module puppetlabs-apache::class_name::param1  (outputs an example value for that parameter, as well as the default value) in a specified format (json or yaml)

Foreman and Puppet Console need this level of detail as well.  Currently, both of these solutions spend quite a bit of time parsing code to show parameters for UI display.   It would be much easier if a schema was available that detailed this level of data.  Think of the speed improvements that could be had if this information was “cached” in a file.   These solutions currently load or intelligently scan all the puppet code for every puppet environment to get the parameters and defaults.   

Here is how we can create a schema http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/    (which I even automated with retrospect-puppet (https://github.com/nwops/puppet-retrospec.git)

However,  we all need to agree on something before schemas can ever be a “thing”.  We need a schema for module schemas.  This is important because as soon as 3rd party tools or scripts start to use schemas and later we decide the schema needs changing, everything breaks.  Tools need a specification to work from. 

So with this in mind and an example schema here: https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml.  How can this be improved?  What should we add?  

About the only change I was pondering was adding another object for the types themselves.   https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml

What are your thoughts?  What steps do we need to take to make this a supported specification?  What would you desire in a module schema?

Am I the only one that thinks this is a killer solution?


Corey Osman








       
  

 

R.I.Pienaar

unread,
Jan 30, 2016, 1:47:48 AM1/30/16
to puppet-dev
This in general is something I've wanted for a long time, and I think we're almost
getting for free now in Puppet 4

In Puppet 4 you can do:

class x(String $y) { }

or

class x(String $y[1,10]) { }

or

class x(Pattern[/\A[a-z].*/]) { }

or
class x(Enum["stopped", "running"] $y) { }

and many more including very complex matchers. This is a lot more featureful AND
maps 1:1 to the capabilities puppet has natively.

I think there are ways now to introspect the classes and extract this metadata
automagically, if not then I think *that* is the feature we should get added to
Puppet and from there build the external validation, introspection and testing
for data as that will give a solution that progresses as Puppet does and give a
lot more "real" results than trying to map this stuff externally to what Puppet
supports

The puppet lookup or similar CLI can be extended to include validation.


>
> The most immediate use case for such a schema is hiera validation as I have
> outlined here: http://logicminds.github.io/blog/2016/01/16/testing-hiera-data
> <http://logicminds.github.io/blog/2016/01/16/testing-hiera-data>. Which works
> <https://github.com/nwops/puppet-retrospec.git>)
>
> However, we all need to agree on something before schemas can ever be a
> “thing”. We need a schema for module schemas. This is important because as
> soon as 3rd party tools or scripts start to use schemas and later we decide the
> schema needs changing, everything breaks. Tools need a specification to work
> from.
>
> So with this in mind and an example schema here:
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml
> <https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml>.
> How can this be improved? What should we add?
>
> About the only change I was pondering was adding another object for the types
> themselves.
> https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml
> <https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml>
>
> What are your thoughts? What steps do we need to take to make this a supported
> specification? What would you desire in a module schema?
>
> Am I the only one that thinks this is a killer solution?
>
>
> Corey Osman
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/27236109-21A1-461F-B02D-10ACAB9D3118%40nwops.io.
> For more options, visit https://groups.google.com/d/optout.

Gareth Rushgrove

unread,
Jan 30, 2016, 12:29:02 PM1/30/16
to puppe...@googlegroups.com
I think there are some interesting ideas here but I want to pull out
the problems, and then jot down a few thoughts. Lots of comments
inline.


On 30 January 2016 at 04:45, Corey Osman <co...@logicminds.biz> wrote:
> Hi,
>
> I wanted to bring up a conversation in hopes that we as a community can
> create a specification for something I am calling module schemas. Before I
> get into that I want to provide a little background info.
>
> This all started a few years ago when hiera first came out. Data seperation
> in the form of parameters and auto hiera lookups quickly became the norm and
> reusable modules exploded into what the forge is today . Because of the
> popularity of hiera, data validation is now a major problem though. Without
> good data, excellent modules become useless.
>
> Puppet 4 and stdlib brought many new functions and ways to validate incoming
> data, and I consider puppet 4 to now be a loosely typed language now.
> Hell, there was even this a long time ago:
> https://github.com/puppetlabs/puppetlabs-kwalify But puppet only does so
> much, and while having validation reside in code might make troubleshooting
> a snap, there is still a delay in the feedback loop when the code is tightly
> coupled with an external “database” of data. Data that is inserted by non
> puppet developers who don’t know YAML or data structures.
>

This appears to be the core problem, and I think it's worth spelling
out separate from the proposed implementation.

Given a set of hiera data, and given a set of puppet modules, how can
I tell if my hiera data is valid?

> So with that said I want to introduce something new to puppet module
> development, called module schemas. A module schema is a specification that
> details the inner workings of a module.

Just to be a little pedantic, this isn't the inner-workings, but the
interface the module presents to the user.

> For right now this means a
> detailed specification of all the parameters for classes and definitions
> used inside a module who’s goal is to make it impossible to insert a bad
> data structure. But ideally, we can specify so much more (functions, types,
> providers, templates) even hiera calls in weird places like templates and
> functions, which are usually things that do not get documented and are hard
> to reference and usually requires looking at source code.
>

A clarifying question. Are you imagining this as something that is
created and maintained by hand, and kept as a concrete thing (i.e a
file in git alongside the code) or as a serialisation that is
generated as needed (and potentially cached by the consumer) from the
Puppet code?

> What does such a schema look like?
>
> Here is a example schema for the apache module which contains 446
> parameters!.
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml
>
> The most immediate use case for such a schema is hiera validation as I have
> outlined here:
> http://logicminds.github.io/blog/2016/01/16/testing-hiera-data. Which works
> AWESOME!. We are validating hiera data and not YAML and doing it under 500
> ms for every commit on every single file.
>
> As a community we need a solution for validating hiera data. Its my belief
> that schemas are the way to go. After all hiera data is now in modules
> with no way to easily validate.
>
> Other use cases that come to mind:
>
> - generating documentation (Many modules on the forge usually contain a
> static map of parameters used inside the module). If a schema was present,
> we could just generate that same map automatically.
>

Strings is looking at documentation generation from Puppet code. You
don't actually need a schema as an intermediary format here too.

> - useful for other 3rd party tools like puppet strings
>
> Parameter specification lookup
> - Imagine a face that shows internal puppet module specifications. I am
> not talking about puppet-strings, this would detail the parameters given a
> class, or an example parameter value given a parameter name.
>

I think in both this and the above case what you're really saying is
that there should be a high-level API/library for parsing Puppet code
and extracting information in a useful format? Or similar to the above
are you seeing this as something that is managed separately from the
code?

> Scenario:
> - puppet module puppetlabs/apache (outputs all the parameters,
> classes for that module) in a specified format (json or yaml)
> - puppet module puppetlabs-apache::class_name (outputs all the
> parameters for the class in a specified format (json or yaml)
> - puppet module puppetlabs-apache::class_name::param1 (outputs an
> example value for that parameter, as well as the default value) in a
> specified format (json or yaml)
>
> Foreman and Puppet Console need this level of detail as well. Currently,
> both of these solutions spend quite a bit of time parsing code to show
> parameters for UI display. It would be much easier if a schema was
> available that detailed this level of data. Think of the speed improvements
> that could be had if this information was “cached” in a file.

Caching could be part of the API, but it's probably more useful for
caching to be part of the consumer (ie. Foreman or whatever), because
knowing when to bust the cache is often context specific.

> These
> solutions currently load or intelligently scan all the puppet code for every
> puppet environment to get the parameters and defaults.
>
> Here is how we can create a schema
> http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/
> (which I even automated with retrospect-puppet
> (https://github.com/nwops/puppet-retrospec.git)
>
> However, we all need to agree on something before schemas can ever be a
> “thing”. We need a schema for module schemas. This is important because as
> soon as 3rd party tools or scripts start to use schemas and later we decide
> the schema needs changing, everything breaks. Tools need a specification to
> work from.
>
> So with this in mind and an example schema here:
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml.
> How can this be improved? What should we add?
>
> About the only change I was pondering was adding another object for the
> types themselves.
> https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml
>
> What are your thoughts? What steps do we need to take to make this a
> supported specification? What would you desire in a module schema?
>
> Am I the only one that thinks this is a killer solution?
>

I think I agree with RI that we want to have a standard way to
"introspect the classes and extract this metadata
automagically" and from there build those tools.

For more relevant context. My points above are mainly about a concern
that maintaining a schema by hand separate to the puppet code itself
(which describes the same thing) is a perilous path. RI's point was
that you already have the Puppet code.

The other option is to flip this on it's head, and generate Puppet
code from a schema. As luck would have it I've been doing some of this
with surprising success recently with the Kubernetes module.

https://github.com/garethr/garethr-kubernetes
https://github.com/garethr/puppet-swagger-generator

Kubernetes has a Swagger schema
(https://github.com/garethr/garethr-kubernetes/blob/master/v1.json),
which describes lots of information about the Kubernetes API,
including about the resources and properties of resources.

In this case it's worth it, mainly as a time saving mechanism to both
create and to maintain the code. The Kubernetes module has about ~200
lines of ruby written by me, and about ~16000 lines of Ruby written by
the generator.

It would be totally possible to generate pure Puppet code from a
schema in a similar way. However, I'm not sure it would solve many
actual problems outside maybe shorthands for thinks like complex
types. Ultimately you'd be taking a pure data format (likely with lots
of repetition and all the fun of pure data) and generating in Puppet
something that's actually more succinct. It's worth noting in the
Kubernetes examples that the schema itself is generated from the
Kubernetes Go source code, it's not hand crafted.

So, an API that gives you an intermediary format describing the code
would likely be a good thing, and several tools could use it. I have a
sneaking suspicion this might already be in Puppet Strings :)

running strings with --emit-json-stdout gives you a JSON schema which
is defined here:

https://github.com/puppetlabs/puppetlabs-strings/blob/master/json_dom.md#defined-types

And Strings is now available to install as a gem, so should be able to
be used as a library for a Hiera data validator.

We could probably take a run at something at the Contributor Summit if
a few people are interested?

Phew, finished

Gareth



>
> Corey Osman
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/27236109-21A1-461F-B02D-10ACAB9D3118%40nwops.io.
> For more options, visit https://groups.google.com/d/optout.



--
Gareth Rushgrove
@garethr

devopsweekly.com
morethanseven.net
garethrushgrove.com

Corey Osman

unread,
Jan 30, 2016, 1:47:04 PM1/30/16
to Puppet Developers
This is one drawback of using an external schema parser, puppet has way more useful types to check against. Of course Puppet 3 only has the basics (bool, string, array, hash).   I have thought about forking the kwalify parser and making more data types so it would be more aware of some puppet data types  (absolute path, cert_type, ...).  I could go down that route, but I would probably be the only maintainer. 


 

I think there are ways now to introspect the classes and extract this metadata
automagically, if not then I think *that* is the feature we should get added to
Puppet and from there build the external validation, introspection and testing
for data as that will give a solution that progresses as Puppet does and give a
lot more "real" results than trying to map this stuff externally to what Puppet
supports

The puppet lookup or similar CLI can be extended to include validation.

While having this built into puppet would be ideal, there are still people on 2.7, and many more on 3.x so it might take some time to migrate them to 4.3.x.  Not to mention almost all forge modules don't include type checking in fear that they will discriminate against 3.Xers. (At least thats how I feel. Internal private modules are a different story. )

Having a tool external to puppet means that it is version independent. You don't have to upgrade to puppet 4.X to get validation. I think this alone is a very good use case. I also believe there is room for an internal puppet tool as well which would eventually replace the external tool. Furthermore, having an external schema also means that when you do upgrade to puppet 4.x you can map your external schema to puppet data types and update 3.x code to utilize data types with a tool to retrofit those additions automatically. 

Corey Osman

unread,
Jan 30, 2016, 10:29:09 PM1/30/16
to Puppet Developers


On Saturday, January 30, 2016 at 9:29:02 AM UTC-8, Gareth Rushgrove wrote:
I think there are some interesting ideas here but I want to pull out
the problems, and then jot down a few thoughts. Lots of comments
inline.


On 30 January 2016 at 04:45, Corey Osman <co...@logicminds.biz> wrote:
> Hi,
>
> I wanted to bring up a conversation in hopes that we as a community can
> create a specification for something I am calling module schemas.  Before I
> get into that I want to provide a little background info.
>
> This all started a few years ago when hiera first came out. Data seperation
> in the form of parameters and auto hiera lookups quickly became the norm and
> reusable modules exploded into what the forge is today .  Because of the
> popularity of hiera, data validation is now a major problem though.  Without
> good data, excellent modules become useless.
>
> Puppet 4 and stdlib brought many new functions and ways to validate incoming
> data, and I consider puppet 4 to now be a loosely typed language now.
> Hell, there was even this a long time ago:
> https://github.com/puppetlabs/puppetlabs-kwalify  But puppet only does so
> much, and while having validation reside in code might make troubleshooting
> a snap, there is still a delay in the feedback loop when the code is tightly
> coupled with an external “database” of data.  Data that is inserted by non
> puppet developers who don’t know YAML or data structures.
>

This appears to be the core problem, and I think it's worth spelling
out separate from the proposed implementation.

I think everyone at every level has fat fingered a typo or inserted invalid data at some point. 
While this is a problem, it also speaks highly of puppet, because I can have my grandmother flip a feature flag to install some complicated thing on a bunch of systems by editing some simple text. ;)
 

Given a set of hiera data, and given a set of puppet modules, how can
I tell if my hiera data is valid?

> So with that said I want to introduce something new to puppet module
> development, called module schemas.  A module schema is a specification that
> details the inner workings of a module.

Just to be a little pedantic, this isn't the inner-workings, but the
interface the module presents to the user.

Yea, that is much better. 
 

> For right now this means a
> detailed specification of all the parameters for classes and definitions
> used inside a module who’s goal is to make it impossible to insert a bad
> data structure.  But ideally, we can specify so much more (functions, types,
> providers, templates) even hiera calls in weird places like templates and
> functions, which are usually things that do not get documented and are hard
> to reference and usually requires looking at source code.
>

A clarifying question. Are you imagining this as something that is
created and maintained by hand, and kept as a concrete thing (i.e a
file in git alongside the code) or as a serialisation that is
generated as needed (and potentially cached by the consumer) from the
Puppet code?

Yes, I do envision a static schema file that would be at the mercy of the developer to update.  The reason behind this
is that puppet (from my eyes and definitely 3.x) cannot tell me schema of a given parameter.  Keep in mind I haven't used 4.3 yet. If using 4.3.x I think the serialization would be the better route since puppet can spit that info out.  Having a external file in the codebase does allow for easier consumption at the fate that it might be out of date.  It also keeps us from having to load puppet and parse out that information everytime. 
This would be extremely useful. Currently every third party tool (puppet-lint, retrospec, strings, foreman, console)
implements their own way to get the same information.  This would definitely make it easier for all of us is there was a higher level API. Seems like this could be done with a new face that is not part of the core. 
Yes, very true. But only if your puppet code is written in 4.x does this make sense.
  

The other option is to flip this on it's head, and generate Puppet
code from a schema. As luck would have it I've been doing some of this
with surprising success recently with the Kubernetes module.

Yea, thats pretty dope.  I thought swagger was just for making REST APIs. 

R.I.Pienaar

unread,
Jan 31, 2016, 2:31:36 AM1/31/16
to puppet-dev
helping people shoot themselves in the foot by using out of date software and
soon to be unsupported versions of puppet is a mistake. Look to the future and
build for the future. Puppet 4 is VERY VERY different from puppet 3 to the point
of being something entirely new.

Maintaining backwards support will simply ensure you rapidly become obsolete.

What you're proposing is big and important and the data landscape in Puppet has
and will continue to change quite rapidly, Puppet 3 compatibility will just
mean you end up NOT serving a ever growing user base as people adopt Puppet 4.

Corey Osman

unread,
Jan 31, 2016, 3:37:53 PM1/31/16
to Puppet Developers
I think we have strayed off topic here. Being able to validate hiera should be something that can easily be done by anyone no matter which version of puppet they use.   The core problem is bad data going into hiera and then into puppet.  The consensus is that we all know this is problem.   While my primary goal was to validate hiera, I think there are other use cases for having an intermediate serialization format of the module's interfaces stored in a file or retrieved dynamically with a puppet face.  

To summarize some of the points discussed:

Building a schema:
  -  We need a higher level API for gathering module types, parameters, and default values given a module, file, class or parameter
     - Puppet should provide a way to output this information in a serialized format and pure ruby objects
        - format should be pluggable with customizable formats (JSON, YAML, Module Schema, .hiera data schema, ..)
        - should leverage puppet's built in datatypes  
        - build a hiera data schema based on all the modules in puppet's modules path specific for each puppet environment

Validating data
  -  Given a hiera data schema, hiera should be able to validate its data, implemented by each backend provider
      - hiera data schemas are unique to every user

Help not force people to use puppet 4
  -  Given a module schema, retrofit puppet 3 code with puppet 4 data types into the module's source code
     - swagger like functionality, with the exception that its updating code
     - This helps people move from puppet 3 to puppet 4 
  - Folks who cannot move to puppet 4 immediately can get the best of both worlds with a easier way to migrate to puppet 4

Module Schema
  - This was never discussed, what should this look like?  Schemas are necessary whether they are statically or dynamically generated. 



Corey

John Bollinger

unread,
Feb 1, 2016, 12:03:46 PM2/1/16
to Puppet Developers


On Sunday, January 31, 2016 at 2:37:53 PM UTC-6, Corey Osman wrote:
 
I think we have strayed off topic here. Being able to validate hiera should be something that can easily be done by anyone no matter which version of puppet they use.


I agree that being able to validate Hiera data would be useful for everyone, no matter what version of Puppet they rely upon.  I have no beef at all with anyone who wants to write tools that have broader version support, as opposed to narrower.  I am quite open to discussing what such tools might look like, how they might work, and what their inputs and outputs might be.

 
  The core problem is bad data going into hiera and then into puppet.  The consensus is that we all know this is problem.   While my primary goal was to validate hiera, I think there are other use cases for having an intermediate serialization format of the module's interfaces stored in a file or retrieved dynamically with a puppet face.  



I agree that bad data is a problem, and a widely recognized one.  Tools and procedures for validating Hiera data are an excellent idea, and I am open to the possibility that a module schema such as you describe might have useful broader applications.  Allowing for such schemata to be obtained dynamically seems the forward-looking approach, but it does not have to be exclusive of static schemata.

Pragmatically, targeting static schemata first may be the best way to get such an effort off the ground. If we sacrifice "good" on the altar of "best" then we stand a good chance of being eternally stuck at "meh".

 
To summarize some of the points discussed:

Building a schema:
  -  We need a higher level API for gathering module types, parameters, and default values given a module, file, class or parameter
     - Puppet should provide a way to output this information in a serialized format and pure ruby objects
        - format should be pluggable with customizable formats (JSON, YAML, Module Schema, .hiera data schema, ..)
        - should leverage puppet's built in datatypes  
        - build a hiera data schema based on all the modules in puppet's modules path specific for each puppet environment



I agree that it would be useful for there to be a mechanism for gathering such information from Puppet manifests.  To whatever extent that needs to be built in to Puppet itself, it seems unlikely that such a feature would appear in any version of Puppet older than the development tip.

As far as pluggable formats go, if you mean output formats then I'm unconvinced.  Or perhaps I would just componentize differently.  It seems to me that a single, flexible form that can serve as a lingua franca should be the immediate target, and I guess I would choose a Ruby object form for that.  If the result is wanted in one or more external formats then defining and emitting the needed outputs is a separate, problem, and likely a much simpler one.

As far as input formats go, I already opined that the best starting point would probably be a static, external schema format, at least for schemata that are not prepared programmatically in object format from the beginning.  There is perhaps room to support more input formats, but I'm not immediately seeing why such support would be more than a tiny win.

 
Validating data
  -  Given a hiera data schema, hiera should be able to validate its data, implemented by each backend provider
      - hiera data schemas are unique to every user



It's unclear to me how building validation directly into Hiera would gain anything if the idea is to rely on schemata gleaned dynamically from manifests in the first place.  I don't see how Hiera could be any more effective than the catalog builder at detecting bad data at runtime if the two are relying on the same (meta)data.  If it isn't any better then putting validation into Hiera would just move the point at which certain data errors are detected, at the cost of additional processing overhead.

On the other hand, I do think that validating on top of hiera is better than validating the underlying data directly.  Puppet sees the data only through the lens of Hiera, and if one is validating for Puppet then one wants to rely on the same view of the data that Puppet has.  Moreover, validating on top of Hiera is independent of any particular Hiera back end.  It may be that endowing Hiera with one or two new capabilities would facilitate offline data validation.  For example, one might want to request a full dump of all data, so as to look for extraneous / misspelled keys.

 
Help not force people to use puppet 4
  -  Given a module schema, retrofit puppet 3 code with puppet 4 data types into the module's source code
     - swagger like functionality, with the exception that its updating code
     - This helps people move from puppet 3 to puppet 4 
  - Folks who cannot move to puppet 4 immediately can get the best of both worlds with a easier way to migrate to puppet 4


Isn't this what P3's future parser is for?  I could see the value of validating data against a more detailed schema than can be extracted from P3 manifests, but I don't see it as migration assistance.  If the data are wrong then that's an inherent problem, not a migration issue.  Migration is in fact a solution, of sorts, to that problem, inasmuch as manifests written with P4 explicit data types can do a better job of validating data and therefore detecting data problems themselves.

 

Module Schema
  - This was never discussed, what should this look like?  Schemas are necessary whether they are statically or dynamically generated. 


I think an information model would be a better starting place than a physical example of a possible schema manifestation.  What kinds of objects must the schema be able to represent?  You mentioned several, but it seems that only a couple of them are represented in the YAML data you linked.  What attributes must each type of object have?


Overall, I think this idea has considerable potential, but I am concerned that it is somewhat unfocused once one goes beyond the central ideas, and that no path to full implementation is mapped out.  I'm inclined to think that the most promising way forward would be to embark on the path that Hiera itself took: (1) build a tool; (2) prove it useful; (3) get it integrated; (4) expand from there.  The "persuade PL up front that it should be done" option that you seem to be on now is commendably audacious, and it may yet bear fruit, but it seems like a low-percentage play.  If you want to continue along that route, though, then it seems like the next step might be to prepare an ARM.


John

Henrik Lindberg

unread,
Feb 1, 2016, 1:09:13 PM2/1/16
to puppe...@googlegroups.com
There were many great replies to this, I am following up on this
and the comments made elsewhere in one go here.
> schema was available that detailed this level of data.. Think of the
> speed improvements that could be had if this information was “cached” in
> a file. These solutions currently load or intelligently scan all the
> puppet code for every puppet environment to get the parameters and
> defaults.
>
> Here is how we can create a schema
> http://logicminds.github.io/blog/2016/01/15/how-to-build-a-module-schema/
> (which I even automated with retrospect-puppet
> (https://github.com/nwops/puppet-retrospec.git)
>
> However, we all need to agree on something before schemas can ever be a
> “thing”. We need a schema for module schemas. This is important
> because as soon as 3rd party tools or scripts start to use schemas and
> later we decide the schema needs changing, everything breaks. Tools
> need a specification to work from.
>
> So with this in mind and an example schema here:
> https://github.com/logicminds/puppet_module_schemas/blob/master/apache_schema.yaml.
> How can this be improved? What should we add?
>
> About the only change I was pondering was adding another object for the
> types themselves.
> https://github.com/logicminds/puppet_module_schemas/blob/master/specification_with_types.yaml
>
> What are your thoughts? What steps do we need to take to make this a
> supported specification? What would you desire in a module schema?
>
> Am I the only one that thinks this is a killer solution?
>
>
> Corey Osman
>

First, about schemas/meta-models:

We are working on the Puppet Type System to make it powerful enought to
dscribe a complete schema. We are doing this so that there is a meta
level (schema level) in muppet that can serve as the foundation for
serialization, and model / schema transformations. I.e. that you can
take such a puppet meta-model (i.e. schema) and transform it into some
other kind of schema.

The new meta-model is being based on the Puppet 4.x type system.

As R.I pointed out, when typing everything in Puppet 4.x this does
define all of the constraints on any data provided via data binding.

What is a bit more difficult to automatically extract are the
expectations on data keys and type constraints for keys that are simply
looked up. This cannot be achieved until runtime since keys (and also
expectations) are dynamically evaluated. Thus, to be able to validate,
there would need to a static declaration of the expectations for these keys.

In addition; (and a bad design) would be if a module depended on
something else to supply data (it comes without defaults).

With puppet 4, you can write the validation in puppet itself.
A module could call a mymodule::verify_data_expectations() function
(writen in .pp syntax). From the command line, you can then run that
with puppet apply:

puppet apply -e 'mymodule::verify_data_expectations()

Or, always run this at runtime.

The function itself looks like a schema - simply map keys to types and
iterate.

{ foo::bar => Integer[1,10],
a_hash_merge => Struct{{ a => Iteger, b => String[1] }]
...
}.each |$key, $type] {

$val = lookup($key) # handle missing key here
assert_type($type, $val) |$t, $v| {
fail("The lookup of key $key expected a type of $t, but got the non
compliant value: $v")
}
}

...or some variation on that - there are several options on both lookup
and assert_type that can be used, and assert_type IIRC uses heuristics
to point to where the type diverges from the wanted type (which is
better for complex types), so it may be more helpful than the manually
crafted error message. And - if the automatic assertion is enough, then
the expected type can be presented directly to lookup and it will do the
type checking.

Then, if everyone does this the same way, the puppet hash is the schema,
and it could be referencenced in module meta data - perhaps by giving
the name of the function that validates.

Then, it would be easy to write something that iterates over all modules
in an environment and calls each modules - data-expectency-validation
function.

Other alterantives when we are done with the meta-model in puppet. The
hashmap mapping keys to names could be expressed that way, and if so
desired transformed to some other schema, that can be used with tools of
your choice to validate some data file in some format loaded by some
hiera backend. (I.e. without having to evaluate and run any puppet
code). Meanwhile, puppet validate could call of the functions.

Just some ideas about how to achieve the goal of validating data
expectations.

- henrik

--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

Trevor Vaughan

unread,
Feb 1, 2016, 2:48:25 PM2/1/16
to puppe...@googlegroups.com
Hi Corey,

I needed to validate my data against a known set of Hiera and/or ENC data for compliance validation and did it with a function: https://github.com/trevor-vaughan/pupmod-compliance.

I would *love* to see something like this hit the core language, but there are quite a few cases where I have items that can be a Boolean, Number, or String (I'm still not loving needing to convert Numbers to Strings everywhere for consistency) so it gets difficult to use the Puppet 4 inbuilt validators.

The linked function certainly doesn't meet everyone's use case, but it fulfills my needs for the moment.

Thanks

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699

-- This account not approved for unencrypted proprietary information --

Eli Young

unread,
Feb 1, 2016, 4:58:24 PM2/1/16
to puppe...@googlegroups.com
On Mon, Feb 1, 2016 at 11:48 AM, Trevor Vaughan <tvau...@onyxpoint.com> wrote:
I would *love* to see something like this hit the core language, but there are quite a few cases where I have items that can be a Boolean, Number, or String (I'm still not loving needing to convert Numbers to Strings everywhere for consistency) so it gets difficult to use the Puppet 4 inbuilt validators.


Variant[Boolean, Number, String] means "must be a Boolean, a Number, or a String", which sounds like exactly what you want.

Trevor Vaughan

unread,
Feb 1, 2016, 9:18:00 PM2/1/16
to puppe...@googlegroups.com
I'll give it a shot again (unfortunately, I have legacy 3.X users so updating to use 4.X features will take some time).

Honestly, I still haven't found a compelling reason for anything besides Booleans, Undef, and Strings. Even the stdlib code converts everything to a string due to the issues with dealing with Strings and Numbers together.

Are there any compelling cases that I'm missing out there?

Happy to fork this to a different thread.

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rob Nelson

unread,
Feb 1, 2016, 9:27:58 PM2/1/16
to puppe...@googlegroups.com
Those three types will be the majority of what you use, sure, but Optional and Enum are awesome. Pattern seems potent but may be difficult to use. Check out how this module uses the type system: https://github.com/jlambert121/jlambert121-puppet/blob/master/manifests/init.pp
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoWKSv2P-yOMD1kzfPYima_KVwzbyTRt6ToaejxqyLebYA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--

Trevor Vaughan

unread,
Feb 2, 2016, 5:36:57 AM2/2/16
to puppe...@googlegroups.com
Hi Rob,

Thanks for posting that, this is probably the best practical example that I've seen so far.

Trevor


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages