Data in Modules. Current status?

122 views
Skip to first unread message

Alessandro Franceschi

unread,
Dec 17, 2013, 7:30:08 AM12/17/13
to puppe...@googlegroups.com
Hi all,
I suppose most of you have read R.I.Pienaar's blog post about his module_data module:
http://www.devco.net/archives/2013/12/08/better-puppet-modules-using-hiera-data.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+planetpuppet+%28Planet+Puppet%29

I've played a bit with it still haven't found any issue.
It's very easy and straightforward to implement (let's be honest, ARM-9 approach was not as plainly easy)
This commit shows how quickly a module can be converted: http://bit.ly/18K6dSY

But the real advantage, at least for me, is not for such a common case (where params.pp may not be elegant but does what's needed), but a case where some values which typically go in params.pp may change not according to the os but according to parameters passed to the module, such as installation method or version.

In such a case I found this approach really a killer.
An example is here: http://bit.ly/1gCkQQ8 
To manage a similar case (Postgres' params when you want to use the official postgres.org repos) in a normal way would require a lot of extra code which can't even stay in params.pp.
A hierarchy that looks like:
---
 :hierarchy:
   - "install_class/%{install_class}"
   - "operatingsystem/%{::operatingsystem}-%{::operatingsystemrelease}"
   - "operatingsystem/%{::operatingsystem}"
   - common
is enough to cover most of the weirdest cases where I wanted to provide modules with alternative install options.

So, my point is:
if it's true (that's how I've understood) that the data in module solution introduced in 3.3.0 has been held back, is there any chance that this one might be introduced into core Puppet?

And, in any case, what's the current plan for Data in modules? 

Best *
Al

Eric Sorenson

unread,
Dec 17, 2013, 10:01:24 PM12/17/13
to puppe...@googlegroups.com
Alessandro Franceschi wrote:
> Hi all,
> I suppose most of you have read R.I.Pienaar's blog post about his
> module_data module:
> http://www.devco.net/archives/2013/12/08/better-puppet-modules-using-hiera-data.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+planetpuppet+%28Planet+Puppet%29
>
> I've played a bit with it still haven't found any issue.
> It's very easy and straightforward to implement (let's be honest, ARM-9
> approach was not as plainly easy)
> This commit shows how quickly a module can be converted:
> http://bit.ly/18K6dSY

This is great Alessandro, I actually hadn't seen that module-skeleton
before.

> But the real advantage, at least for me, is not for such a common case
> (where params.pp may not be elegant but does what's needed), but a case
> where some values which typically go in params.pp may change not
> according to the os but according to parameters passed to the module,
> such as installation method or version.
>
> In such a case I found this approach really a killer.
> An example is here: http://bit.ly/1gCkQQ8

I see that it moved the params into multiple yaml files. I guess this is
the core of it: if you see data in yaml as being superior, then having
more ubiquitous hiera+yaml seems like a natural fit. If you like the DSL
and programming-language features such as string interpolation, you
either end up putting those features into YAML (like the Hiera 1.3
lookup function) or massaging the return values once you're back in the
DSL.

> So, my point is:
> if it's true (that's how I've understood) that the data in module
> solution introduced in 3.3.0 has been held back, is there any chance
> that this one might be introduced into core Puppet?
>
> And, in any case, what's the current plan for Data in modules?

So let me recap from the start and talk a bit about where I think it's
going. Honestly, this is kind of a painful mail to write because in a
way it is a litany of my failure to manage this feature effectively.

RI's original pull request had a couple of minor issues that are in the
#20-#30 update range on #16856 and could have easily been fixed up after
merge. I should not have let it be overshadowed by the Binder
implementation and should not have let that be merged in without
following the ARM process, which we'd introduced in between the first
talk about this feature and the later developments. Looking back, this
merge caused a big problem because it created an unfair playing field
between competing implementations.

Then, the 3.3.0 implementation (Binder) got hammered in user testing and
then I failed at prioritizing its removal in time for 3.4.0. So that
implementation is in kind of Purgatory until the tickets to clean it up
can be addressed (these are linked under
https://tickets.puppetlabs.com/browse/PUP-42 ). The discussion from
October led to the tentative plan to introduce a simple, hiera-based DSL
data binding lookup but clearly I did not incorporate the demand for
YAML based data storage and, in any case, the development time for it
also got squeezed out as we got close to 3.4.0. So here we are.

A semi-ordered list of next steps:
- me: migrate the original ticket to JiRA to preserve history and invite
further comment - done, now at
https://tickets.puppetlabs.com/browse/PUP-1157

- community: please try RI's module_data in your modules and write up
your experience, especially with complex modules as I'm very worried
about the problem that Gary Larizza describes on his blog post about
deep hiera hierarchies becoming hard to reason about (
http://garylarizza.com/blog/2013/12/08/when-to-hiera/ under "What are
you trying to say"). I've talked to several sysadmins at large sites who
went all-in with Hiera and ended up with >15-layer hierarchies that were
incomprehensibly complex to the rest of their team (in a way, hierarchy
layout is just a super-compressed 'if' statement and the same problems
apply).

- all: try to work through the problems we know we have in data bindings
and hiera on the way to a complete solution, i.e.
https://tickets.puppetlabs.com/browse/HI-118
https://tickets.puppetlabs.com/browse/HI-46
and the general problem of expressing a dependency on the module_data
backend in a consuming module's metadata (today, being in a module is an
improvement over it being in core, because a DIM-enabled consumer can
explicitly state that dependency, which obviously isn't the greatest).

- me: stay on top of communicating these developments to a far better
degree so we're not still in this situation 6 months down the line. I'd
also like to make sure we're all trying solve the same problem and will
start up a google doc to facilitate commenting and collaboration on
goals and acceptance criteria.

Hope this helps.

--
Eric Sorenson - eric.s...@puppetlabs.com - freenode #puppet: eric0
puppet platform // coffee // techno // bicycles

R.I.Pienaar

unread,
Dec 18, 2013, 5:47:55 AM12/18/13
to puppe...@googlegroups.com
or both? Because like in other languages it seems perfectly reasonable to
load data then use your language features to manipulate, validate or derive
new data in the language rather than in the data.

The two are highly complementary not mutually exclusive.
True deep hierarchies are hard to reason about, as is huge case statements or
if blocks and really anything that's hugely overused without upfront design.

Hiera almost promotes this because it leeds you to think you can postpone
designing your data structure till you want to use it. But really your data
design is hugely important and requires a lot of thought and planning so
you are sure you understand it and it behaves in a way you can reason about

This is true for any data store. Would love to hear what kind of approaches
you considered would alleviate the need for this or make it easier to have
data you can refactor later.

For one I think a way to profile a compile and extract out of it the data it
fetched from where and what interpolation it did would go a long way to help,
I doubt it will resolve much though. It kinds of needs to be at compile time
as since is shown here using in scope variables in your hierarchy is sometimes
very valuable to achieve a level of logic in your hierarchy without embedding
code in it. So a snapshot of data per compile that you can look at or ask
'how was variable x::y resolved' would help quite a bit and it shouldnt be
too hard.


>
> - all: try to work through the problems we know we have in data bindings
> and hiera on the way to a complete solution, i.e.
> https://tickets.puppetlabs.com/browse/HI-118
> https://tickets.puppetlabs.com/browse/HI-46
> and the general problem of expressing a dependency on the module_data
> backend in a consuming module's metadata (today, being in a module is an
> improvement over it being in core, because a DIM-enabled consumer can
> explicitly state that dependency, which obviously isn't the greatest).
>
> - me: stay on top of communicating these developments to a far better
> degree so we're not still in this situation 6 months down the line. I'd
> also like to make sure we're all trying solve the same problem and will
> start up a google doc to facilitate commenting and collaboration on
> goals and acceptance criteria.
>
> Hope this helps.

Thanks Eric,

Alessandro Franceschi

unread,
Dec 18, 2013, 10:12:58 AM12/18/13
to puppe...@googlegroups.com
I'll for surely test Rip's solution more, but for how I can see te whole matter the hiera hierarchies proliferation is most of the times due to a wrong approach towards modules:
as Gary underlined, all OS specific stuff should be in module domain (that's how I conceive a reusable module and the first requirement for reusability is multiple OS support) and therefore in an eventual in module data hierarchy.
All the business logic , which varies on each site and can't be standardised, is managed in the main hiera hierarchy.
To be honest most of the (also official) documentation examples about sample hierarchies in hiera.yaml, where most of the times where used os related facts, has not helped people in understanding this ( remember a, mostly-ignored-as-usual, comment I did on this old post about hiera: http://puppetlabs.com/blog/first-look-installing-and-using-hiera ).

About the internal module hierarchy up to now I haven't found situations where I needed to manage something more that osfamily/operatingsystem/operatingsystemversion differences, the only notable difference is in cases where different installation options (and relevant class parameters) might involve an extra layer in the module's hierarchy (like the %{install_class} in the sample  http://bit.ly/1gCkQQ8  )
If someone has more complex cases please let's review them and see if they can be managed with RIP's module, considering that, as he himself said in other posts, we don't have necessarily have to cover the 100% of use cases, for corner situations nobody prevents us to keep on using the solutions available from Puppet DSL.
 

- all: try to work through the problems we know we have in data bindings
and hiera on the way to a complete solution, i.e.
https://tickets.puppetlabs.com/browse/HI-118
https://tickets.puppetlabs.com/browse/HI-46
and the general problem of expressing a dependency on the module_data
backend in a consuming module's metadata (today, being in a module is an
improvement over it being in core, because a DIM-enabled consumer can
explicitly state that dependency, which obviously isn't the greatest).

- me: stay on top of communicating these developments to a far better
degree so we're not still in this situation 6 months down the line. I'd
also like to make sure we're all trying solve the same problem and will
start up a google doc to facilitate commenting and collaboration on
goals and acceptance criteria.

Hope this helps.

Definitively.
Thank you for the exhaustive reply, hope a solution to this core feature for modules will be sorted out in reasonable times.
Hope also that whatever situation may arise it will be possible to use it also on earlier Puppet versions (as is Rip's module) .
Al

Reply all
Reply to author
Forward
0 new messages