Changes to PuppetDB to support proper environments

300 views
Skip to first unread message

Ken Barber

unread,
Mar 28, 2014, 1:46:41 PM3/28/14
to Puppet Users, puppe...@googlegroups.com
Hey all,

TL;DR: We're adding support to environments to PuppetDB but have a
small migration hassle we wanted some community opinion on. If you're
interested in PuppetDB and environments read on.

So we're looking at adding first class support to PuppetDB for
environments. In the past we would happily accept facts/reports &
catalogs from multiple different environments, but that information
was never stored with the data.

As a consequence people trying to use the same PuppetDB instance for
environments would find they are collecting globally across all
environments thus creating a high chance of resource collision. For
many people this just wouldn't work, but for some they probably found
horrible ways of working around this (if you are one of these people,
I'd probably want to hear from you).

Not to mention we also have no way to represent this data in the PE
console either, basically PuppetDB just has no environment awareness.

So we're fixing this now (at least the PDB part). Our current Epic:
https://tickets.puppetlabs.com/browse/PDB-47 and relevant child
tickets covers this work if you want to follow along at home.

The issue we've hit however is how to cleanly migrate from a world of
no environments, to one where environments are going to be
constraining collections. Let me break it down:

* Currently all submissions to PDB have no environment knowledge, and
we can't store it anyway.
* During an upgrade to this new feature we have to set the environment
of all items in our database to 'something' internally and externally,
however without prior knowledge everything will be labelled the same.
* Once environment awareness is added to the terminus we only want to
collect on environments the agent transaction was run with, this
complies with the semantics of collections going back to whenever
storeconfigs & environments were added to Puppet.

One strategy for upgrades is to set the environment to 'production'
for existing data. This will stay like this until another catalog is
submitted with the true environment for that node.

Now the problem with that solution is related to people who might be
using exported resources (and somehow avoiding collection collisions):

* In a single environment setup, where the name of the environment is
not 'production'.
* In a multi-environment setup where environment names are quite
different (but only 1 is production).

For both of these cases, if we just default the environment to
'production' there will be a short time where collections will return
nothing for environments not called 'production' - until the next
catalog submission occurs. This could be detrimental, and we'd like to
understand how many users this might impact. Please let me know if
this is you.

An alternate solution is to migrate existing data to have the
environment set to something completely different (like 'nil' or
something else that can't possibly collide with a real name). With
this solution, we can put in a collection query that not only collects
for the current environment, but also for 'nil' to pick up the older
global resources ... until such time as all catalogs have submitted
thus putting the catalogs in the correct environment.

This concept of 'nil' is almost transient (although we be longer lived
for old reports obviously) its a temporary marker to say 'we don't
know the environment'. So in most cases, once all catalogs have been
submitted by all nodes the concept disappears. No new data should be
added to this internal special environment.

Anyway - I'm looking for some feedback for these two alternate
solutions (or a third more whacky solution - whatever :-). The first
one is obviously easiest and avoids leaving behind a transient
solution, the second we think solves this concern to some extent (at
least we think so).

Any feedback or opinion on this would be much appreciated.

ken.

John Bollinger

unread,
Mar 28, 2014, 3:15:25 PM3/28/14
to puppe...@googlegroups.com


On Friday, March 28, 2014 12:46:41 PM UTC-5, Ken Barber wrote:
Hey all,

TL;DR: We're adding support to environments to PuppetDB but have a
small migration hassle we wanted some community opinion on. If you're
interested in PuppetDB and environments read on.

[...]

Anyway - I'm looking for some feedback for these two alternate
solutions (or a third more whacky solution - whatever :-). The first
one is obviously easiest and avoids leaving behind a transient
solution, the second we think solves this concern to some extent (at
least we think so).

Any feedback or opinion on this would be much appreciated.



As a matter of general principles, I am inclined to think that it is better to tell the truth to your software than to lie.  Your software is less likely to do unexpected things that way, and people are less likely to be confused.  As that applies to the PuppetDB situation, it means PuppetDB should not consider or report machines as being in a given environment when it's uncertain whether that's true.  I would thus recommend the second alternative, with it's provision for recording that the environment is unknown.

Whatever problems may attend that approach, it's a steadier base to build on.  In this case, it also preserves information that otherwise would be lost, to wit: for which nodes the environment is affirmatively known, as opposed to guessed / defaulted.


John


Ken Barber

unread,
Mar 28, 2014, 4:05:23 PM3/28/14
to puppe...@googlegroups.com
Thanks for your perspective John, much appreciated.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/078ee482-1771-482d-89f4-8090e7dc7fb7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Spencer Krum

unread,
Mar 28, 2014, 6:42:22 PM3/28/14
to puppe...@googlegroups.com
We've been simply tagging resources with env_${environment} both on export and collect.



For more options, visit https://groups.google.com/d/optout.



--
Spencer Krum
(619)-980-7820

Ken Barber

unread,
Mar 28, 2014, 9:52:26 PM3/28/14
to puppe...@googlegroups.com
> We've been simply tagging resources with env_${environment} both on export
> and collect.

Which sounds like a reasonable work-around. In this case only option 2
would guarantee uninterrupted service for you correct?

ken.

Spencer Krum

unread,
Mar 28, 2014, 10:02:58 PM3/28/14
to puppe...@googlegroups.com
To be honest, I'm not sure it's a big deal. I'm okay with just re-initializing the database and pushing new puppetdb and modified code on the same day. The use here is to keep test stuff out of production only, not anything more complicated.


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Spencer Krum
(619)-980-7820

Ken Barber

unread,
Mar 29, 2014, 10:39:56 AM3/29/14
to puppe...@googlegroups.com

Daniele Sluijters

unread,
Mar 29, 2014, 6:17:16 PM3/29/14
to puppe...@googlegroups.com
The one issue I see with that approach is what happens when your monitoring system, *choke* nagios *choke* is dependant on said exported resources. Wiping the database clean isn't really an option in that case or you'd need to temporarily disable reloading your monitoring until everyone's checked back in again. The same problems would occur if people were using those exported resources to configure things like HAProxy, Apache BalanceMembers or nginx upstream entities (to name a few).

I'm not entirely getting the resource collection collision thing, what's going on there?

Ken Barber

unread,
Mar 29, 2014, 10:57:47 PM3/29/14
to puppe...@googlegroups.com
> The one issue I see with that approach is what happens when your monitoring
> system, *choke* nagios *choke* is dependant on said exported resources.
> Wiping the database clean isn't really an option in that case or you'd need
> to temporarily disable reloading your monitoring until everyone's checked
> back in again. The same problems would occur if people were using those
> exported resources to configure things like HAProxy, Apache BalanceMembers
> or nginx upstream entities (to name a few).

This is the world I see, it won't affect everyone though and
theoretically with 1 hour check-ins it will be solved next run. My
fear is more around those that don't run puppet as often.

> I'm not entirely getting the resource collection collision thing, what's
> going on there?

So because there was no environment awareness, all resources from all
environments would be sent to PuppetDB with no environment being
marked. As a consequence, no matter what environment you were
collecting from all resources on that PuppetDB instance were thus up
for collection. Without work-arounds (like what Spencer mentioned) you
may potentially collect both test and prod resources (for example)
that represented the same Type/Title combination thus creating a
duplicate resource.

The idea, is to make this all transparent and work like the old days
of storeconfigs, and force collection to only be on the environment
you are running in. Thus removing the need for a work around.

Possibly too much 'thus' in that explanation, let me know if it makes sense.

ken.

Erik Dalén

unread,
Mar 30, 2014, 4:19:49 AM3/30/14
to Puppet Developers
I'm wondering if it will be possible to disable this behaviour on resource collections in the terminus?
For us puppet environments are mapped to git branches, and actual environments (like testing and prod) have different Puppet CAs and PuppetDBs (it is really best to not mix them anyway).
So it is all right if we record the environment (git branch), but we want to collect across all of them. And actually this might be useful during a migration period for others as well, until all nodes have sent in their new catalogs with environment info.


Also, 2.x agents sends their environment as a fact, so for them you could migrate the data.



ken.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Erik Dalén

Daniele Sluijters

unread,
Mar 30, 2014, 3:06:57 PM3/30/14
to puppe...@googlegroups.com
> This is the world I see, it won't affect everyone though and 
> theoretically with 1 hour check-ins it will be solved next run. My 
> fear is more around those that don't run puppet as often. 

Oh. I was under the impression most people run Puppet at least every half hour, if not even more often. I'm likely off the target by a few hours though.

> So because there was no environment awareness, all resources from all 
> environments would be sent to PuppetDB with no environment being 
> marked. As a consequence, no matter what environment you were 
> collecting from all resources on that PuppetDB instance were thus up 
> for collection. Without work-arounds (like what Spencer mentioned) you 
> may potentially collect both test and prod resources (for example) 
> that represented the same Type/Title combination thus creating a 
> duplicate resource. 

Ah. Interestingly enough I never ran into this. Kinda makes me wonder if that's just dumb luck.

I'm all for only collection from the environment you're from but there needs to be a way to override this. No matter the environment I still want all my machines monitored by my Nagios instance which happens to be running on the production environment.

-- 
Daniele Sluijters

Ken Barber

unread,
Mar 30, 2014, 10:43:20 PM3/30/14
to puppe...@googlegroups.com
> I'm wondering if it will be possible to disable this behaviour on resource
> collections in the terminus?
> For us puppet environments are mapped to git branches, and actual
> environments (like testing and prod) have different Puppet CAs and PuppetDBs
> (it is really best to not mix them anyway).
> So it is all right if we record the environment (git branch), but we want to
> collect across all of them. And actually this might be useful during a
> migration period for others as well, until all nodes have sent in their new
> catalogs with environment info.

I think this is a reasonable request, and I think it does provide a
mechanism for others to temporarily work around the lack of
cross-environment search in the language - a feature that is quite
valid, and probably also belongs in the language (ie. collect with an
environment == clause for example). Can you update
https://tickets.puppetlabs.com/browse/PDB-455 with this idea? I'm
working on the code now, it can be easily added so its worth it.

> Also, 2.x agents sends their environment as a fact, so for them you could
> migrate the data.

But not 3.x agents. This actually would have been a great idea (and we
could have used this to migrate the catalog), but we're dropping 2.x
support in PDB 2.x. It might get some users though ... I'm dithering
on whether a ticket should exist for this investigation now.

ken.

Ken Barber

unread,
Mar 30, 2014, 10:49:20 PM3/30/14
to puppe...@googlegroups.com
>> This is the world I see, it won't affect everyone though and
>> theoretically with 1 hour check-ins it will be solved next run. My
>> fear is more around those that don't run puppet as often.
>
> Oh. I was under the impression most people run Puppet at least every half
> hour, if not even more often. I'm likely off the target by a few hours
> though.

This certainly is becoming more and more true over time, but it wasn't
always that way at all, in fact quite the opposite. Not to mention new
users may be less interested in running regularly due to wariness of
the technology. In short we cater for both cases where we can.

>> So because there was no environment awareness, all resources from all
>> environments would be sent to PuppetDB with no environment being
>> marked. As a consequence, no matter what environment you were
>> collecting from all resources on that PuppetDB instance were thus up
>> for collection. Without work-arounds (like what Spencer mentioned) you
>> may potentially collect both test and prod resources (for example)
>> that represented the same Type/Title combination thus creating a
>> duplicate resource.
>
> Ah. Interestingly enough I never ran into this. Kinda makes me wonder if
> that's just dumb luck.

I'd be curious to hear why you think it succeeded. Like you say
probably just lucky. Also, the less you are using collected resources,
the chance of hitting it decreases.

> I'm all for only collection from the environment you're from but there needs
> to be a way to override this. No matter the environment I still want all my
> machines monitored by my Nagios instance which happens to be running on the
> production environment.

I concur with the sentiments and I think the crowd has spoken. I'm
pretty sure we'll just add a configuration option for this.

ken.

Ken Barber

unread,
Mar 30, 2014, 10:53:39 PM3/30/14
to puppe...@googlegroups.com
>> I'm all for only collection from the environment you're from but there needs
>> to be a way to override this. No matter the environment I still want all my
>> machines monitored by my Nagios instance which happens to be running on the
>> production environment.
>
> I concur with the sentiments and I think the crowd has spoken. I'm
> pretty sure we'll just add a configuration option for this.

Further to this, I'd say there is a need to add an environment ==
clause during collection as well, so that the language provides an
ability to do something finer grained. Its out of scope for our work
no doubt, probably more of a future PUP ticket or armature.

ken.

Steven Kurylo

unread,
Mar 31, 2014, 1:33:12 PM3/31/14
to puppe...@googlegroups.com
On Sun, Mar 30, 2014 at 7:49 PM, Ken Barber <k...@puppetlabs.com> wrote:
> I'm all for only collection from the environment you're from but there needs
> to be a way to override this. No matter the environment I still want all my
> machines monitored by my Nagios instance which happens to be running on the
> production environment.

I concur with the sentiments and I think the crowd has spoken. I'm
pretty sure we'll just add a configuration option for this.


We also collect across multiple production instances which have unique environments.  Nagios being one example, but have several other modules which do the same. 

We keep dev on a different puppet master, so we haven't had issues with collisions. 

Cheers

Ken Barber

unread,
Mar 31, 2014, 2:19:25 PM3/31/14
to puppe...@googlegroups.com
>> > I'm all for only collection from the environment you're from but there
>> > needs
>> > to be a way to override this. No matter the environment I still want all
>> > my
>> > machines monitored by my Nagios instance which happens to be running on
>> > the
>> > production environment.
>>
>> I concur with the sentiments and I think the crowd has spoken. I'm
>> pretty sure we'll just add a configuration option for this.
>>
>
> We also collect across multiple production instances which have unique
> environments. Nagios being one example, but have several other modules
> which do the same.
>
> We keep dev on a different puppet master, so we haven't had issues with
> collisions.

So what if we created a configuration item, say
'collection_environments' and had it accept two different options:
'same' and 'all'.

The default is up for argument, but 'all' would keep the existing behaviour.

ken.

John Bollinger

unread,
Mar 31, 2014, 3:55:27 PM3/31/14
to puppe...@googlegroups.com


Coarser grained too, perhaps?  That is, for the case where puppetdb is configured with collection_environments = 'same', does it not make sense to support, say,

    Nagios_host<<| environment == * |>>

to collect resources from all environments?  Or maybe the smoothest path would be to keep the default behavior of collecting from every environment, but support an 'environment' key in collectors by which to narrow that to a single environment.  That would retain current behavior for all current code.

Upon reflection, I think it would be wise to control which resources are collected strictly via search expressions.  I disfavor a configuration setting affecting that, because if there were one then it would be likely that different modules would be developed with different assumptions about the configured behavior.  Or to put it a different way, people should not need to refer to Puppet's configuration to determine what any given snippet of DSL code means.

Do consider also that either form of search expression involving 'environment' represents a non-trivial change in search scope.  Currently, search expressions consider only attributes of candidate resources, whereas environment is an attribute of the node on behalf of which the resource is exported.  That's by no means an inherent problem, but it does open the door to requests for other collection behaviors based on the characteristics of the exporting node.  (First will be collection by exporting node identity, then by nodes declaring certain classes or resources, then ....)  Be ready.


John

Ken Barber

unread,
Mar 31, 2014, 4:47:18 PM3/31/14
to puppe...@googlegroups.com
> Coarser grained too, perhaps? That is, for the case where puppetdb is
> configured with collection_environments = 'same', does it not make sense to
> support, say,
>
> Nagios_host<<| environment == * |>>
>
> to collect resources from all environments? Or maybe the smoothest path
> would be to keep the default behavior of collecting from every environment,
> but support an 'environment' key in collectors by which to narrow that to a
> single environment. That would retain current behavior for all current
> code.
>
> Upon reflection, I think it would be wise to control which resources are
> collected strictly via search expressions. I disfavor a configuration
> setting affecting that, because if there were one then it would be likely
> that different modules would be developed with different assumptions about
> the configured behavior. Or to put it a different way, people should not
> need to refer to Puppet's configuration to determine what any given snippet
> of DSL code means.

I guess the problem such a solution leaves is, that people who really
do want global collection have to litter their code with explicit
environment == '*' clauses. How many people will get affected by this
would be interesting, seems at least 2 people on the response so far.
Any ideas for that?

> Do consider also that either form of search expression involving
> 'environment' represents a non-trivial change in search scope. Currently,
> search expressions consider only attributes of candidate resources, whereas
> environment is an attribute of the node on behalf of which the resource is
> exported. That's by no means an inherent problem, but it does open the door
> to requests for other collection behaviors based on the characteristics of
> the exporting node. (First will be collection by exporting node identity,
> then by nodes declaring certain classes or resources, then ....) Be ready.

Indeed.

ken.

Ken Barber

unread,
Mar 31, 2014, 9:43:55 PM3/31/14
to puppe...@googlegroups.com
>> Upon reflection, I think it would be wise to control which resources are
>> collected strictly via search expressions. I disfavor a configuration
>> setting affecting that, because if there were one then it would be likely
>> that different modules would be developed with different assumptions about
>> the configured behavior. Or to put it a different way, people should not
>> need to refer to Puppet's configuration to determine what any given snippet
>> of DSL code means.
>
> I guess the problem such a solution leaves is, that people who really
> do want global collection have to litter their code with explicit
> environment == '*' clauses. How many people will get affected by this
> would be interesting, seems at least 2 people on the response so far.
> Any ideas for that?

That said, I shouldn't speak for others. For those who do currently
use global resource collection, how would such a solution affect you?
And any other thoughts you can think of ...

ken.

Spencer Krum

unread,
Mar 31, 2014, 10:53:19 PM3/31/14
to puppe...@googlegroups.com
Works for me.



ken.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Spencer Krum
(619)-980-7820

Jeremy T. Bouse

unread,
Mar 31, 2014, 11:54:35 PM3/31/14
to puppe...@googlegroups.com
For me simply having the environment as a variable that I'm able to
search on is solution enough. The current situation that I have is that
I have 2 puppet masters that I'd like to be able to use the same
PuppetDB instance. I use Hiera to control deployments and have the
simple configuration:

puppet::agent::puppet_server::_nodequery:
query: 'Class[puppet::master]'
fact: 'fqdn'

in my common.yaml file to pull out the FQDN of the node defined with
the puppet::master class. I would love to be able to update the query to
include the environment and be able to have both environments use the
same instance. As I deploy the ssh known_hosts via exported resources
and all the nodes regardless of which environment would be nice to have
realized on all hosts within my network, I wouldn't care about the
environment those resources are associated with as they are part of my
larger domain as a whole.

David Schmitt

unread,
Apr 1, 2014, 1:56:03 AM4/1/14
to puppe...@googlegroups.com
In my mind there are two big use-cases that are covered by
environments: keeping systems with the same code but different arguments
separate and keeping systems with different code separate. The latter
does not work at the moment due to fundamental problems within the
puppetmaster (global loading of custom types (fixable, but hard, I
understand), and cross-environment differences in the structure of
collected resources (not fixable)).

Therefore I would love to see puppetlabs officially denounce running
structurally differing environments on the same puppetmaster (or create
stronger borders, although I'm worried about the practical side of
this). This allows a default collection across environments without
remorse and will keep the syntax clean.

Generic modules will have to expose a explicit tag/scope parameter for
internally used exported resources anyways, to allow for different use
cases.



Regards, David

Erik Dalén

unread,
Apr 1, 2014, 5:11:38 AM4/1/14
to Puppet Developers
This seems a bit backwards to me, for all other parts of the query you just leave it out if you don't want to match on it. There's no need for a explicit tags=='*' if you want to match on all tags for example. So I don't see why environment matching would work the opposite way.

So I'm proposing instead that you add environment==$::environment to your query if you want to collect only from your current environment.

--
Erik Dalén

Daniele Sluijters

unread,
Apr 1, 2014, 9:05:20 AM4/1/14
to puppe...@googlegroups.com
I... kinda like that suggestion. I would keep current behaviour intact, so collection would work 'as expected though weirdly' and not break current manifests. People who are up to date on this can explicitly select an environment to collect.

I also think that this approach works better for community modules. What if your module ships with it's own native type and you want to collect on those? If you need to explicitly pass an environment for collection to even work, what do you pass, 'production' and hope that everyone works from that environment?

-- 
Daniele Sluijters

Ken Barber

unread,
Apr 1, 2014, 9:59:22 AM4/1/14
to puppe...@googlegroups.com
> I... kinda like that suggestion. I would keep current behaviour intact, so
> collection would work 'as expected though weirdly' and not break current
> manifests. People who are up to date on this can explicitly select an
> environment to collect.
>
> I also think that this approach works better for community modules. What if
> your module ships with it's own native type and you want to collect on
> those? If you need to explicitly pass an environment for collection to even
> work, what do you pass, 'production' and hope that everyone works from that
> environment?

This is a good point regarding "How do we expect environment
searchability to be applicable in modules?" This kind of feels like
the concept of baking in stages into modules. While on the surface
this might seem neat, does this create assumptions about how you build
out your environments into a module?

>> This seems a bit backwards to me, for all other parts of the query you
>> just leave it out if you don't want to match on it. There's no need for a
>> explicit tags=='*' if you want to match on all tags for example. So I don't
>> see why environment matching would work the opposite way.
>>
>> So I'm proposing instead that you add environment==$::environment to your
>> query if you want to collect only from your current environment.

How would you do this with 3rd party modules though? Modify them?

We need to be wary of people who do just want real separation, no one
has chimed in formally yet about this yet (this conversation has been
mainly on puppet-dev).

ken.

Henrik Lindberg

unread,
Apr 1, 2014, 10:49:11 AM4/1/14
to puppe...@googlegroups.com
On 2014-01-04 11:11, Erik Dalén wrote:
> On 1 April 2014 03:43, Ken Barber <k...@puppetlabs.com
Going forward, I imagine making the query operations in Puppet more
flexible and that the query operates on a container of some sort; either
all environments, or current environment. Further, the type of the
result narrows to an even smaller container. This can be expressed using
the type system. I can imagine an Environment[E] type where E is the
name of the environment, and Environment[default] is the current
environment. Then, the Environment type can be further parameterized
with a type. e.g. search for Foo instances in environment test is
expressed as:

Environment[test, Foo] <spaceship>

Just throwing 2c into the pond...

- henrik

David Schmitt

unread,
Apr 1, 2014, 11:31:46 AM4/1/14
to puppe...@googlegroups.com
On 2014-04-01 15:59, Ken Barber wrote:
>> I... kinda like that suggestion. I would keep current behaviour
>> intact, so
>> collection would work 'as expected though weirdly' and not break
>> current
>> manifests. People who are up to date on this can explicitly select
>> an
>> environment to collect.
>>
>> I also think that this approach works better for community modules.
>> What if
>> your module ships with it's own native type and you want to collect
>> on
>> those? If you need to explicitly pass an environment for collection
>> to even
>> work, what do you pass, 'production' and hope that everyone works
>> from that
>> environment?
>
> This is a good point regarding "How do we expect environment
> searchability to be applicable in modules?" This kind of feels like
> the concept of baking in stages into modules. While on the surface
> this might seem neat, does this create assumptions about how you
> build
> out your environments into a module?

In the modules I remember writing, I've always used tags and naked tag
queries to export/collect resources specific to the module's function.

The most complex setup is this munin module, which uses double
indirection through a class parameter to get to the actual tag value:


https://github.com/DavidS/puppet-munin/blob/master/manifests/init.pp#L556
->
https://github.com/DavidS/puppet-munin/blob/master/manifests/init.pp#L381
->
https://github.com/DavidS/puppet-munin/blob/master/manifests/init.pp#L16
and
https://github.com/DavidS/puppi/blob/master/lib/puppet/parser/functions/get_magicvar.rb

Having to consider environments here too, would be annoyingly complex.

See also my other mail.

Regards, David


Erik Dalén

unread,
Apr 1, 2014, 11:35:15 AM4/1/14
to Puppet Developers
On 1 April 2014 15:59, Ken Barber <k...@puppetlabs.com> wrote:
> I... kinda like that suggestion. I would keep current behaviour intact, so
> collection would work 'as expected though weirdly' and not break current
> manifests. People who are up to date on this can explicitly select an
> environment to collect.

Well, it breaks resources that has a parameter called "environment". As this is a attribute on the resource object in puppetdb, not a resource parameter.

In puppetdbquery I've been thinking about using a dot in front to query attributes of the object in question. So this would be .environment instead of environment (and then you can also search for things like sourceline using .sourceline etc)
 
>
> I also think that this approach works better for community modules. What if
> your module ships with it's own native type and you want to collect on
> those? If you need to explicitly pass an environment for collection to even
> work, what do you pass, 'production' and hope that everyone works from that
> environment?

This is a good point regarding "How do we expect environment
searchability to be applicable in modules?" This kind of feels like
the concept of baking in stages into modules. While on the surface
this might seem neat, does this create assumptions about how you build
out your environments into a module?

>> This seems a bit backwards to me, for all other parts of the query you
>> just leave it out if you don't want to match on it. There's no need for a
>> explicit tags=='*' if you want to match on all tags for example. So I don't
>> see why environment matching would work the opposite way.
>>
>> So I'm proposing instead that you add environment==$::environment to your
>> query if you want to collect only from your current environment.

How would you do this with 3rd party modules though? Modify them?

Well, they can accept a collect_environment parameter that could default to '*' or so and use that in the query.
 
We need to be wary of people who do just want real separation, no one
has chimed in formally yet about this yet (this conversation has been
mainly on puppet-dev).

ken.
--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Erik Dalén

Steven Kurylo

unread,
Apr 1, 2014, 11:49:34 AM4/1/14
to puppe...@googlegroups.com
On Mon, Mar 31, 2014 at 1:47 PM, Ken Barber <k...@puppetlabs.com> wrote:
> Coarser grained too, perhaps?  That is, for the case where puppetdb is
> configured with collection_environments = 'same', does it not make sense to
> support, say,
>
>     Nagios_host<<| environment == * |>>
>
> to collect resources from all environments?  Or maybe the smoothest path
> would be to keep the default behavior of collecting from every environment,
> but support an 'environment' key in collectors by which to narrow that to a
> single environment.  That would retain current behavior for all current
> code.
>
> Upon reflection, I think it would be wise to control which resources are
> collected strictly via search expressions.  I disfavor a configuration
> setting affecting that, because if there were one then it would be likely
> that different modules would be developed with different assumptions about
> the configured behavior.  Or to put it a different way, people should not
> need to refer to Puppet's configuration to determine what any given snippet
> of DSL code means.

I guess the problem such a solution leaves is, that people who really
do want global collection have to litter their code with explicit
environment == '*' clauses. How many people will get affected by this
would be interesting, seems at least 2 people on the response so far.
Any ideas for that?


Since the environment documentation talks about serving different modules for different environments, I think it makes more sense to have collection be local to the environment.  I wouldn't expect environments to interact with each other.  If the modules, manifests, and virtual resources are specific to an environment, why wouldn't exported be the same?  I would expect with no environment specified, it would default to the current environment.

I'm ok with adding environment == '*' to my modules when I need cross environments.  While I do it in several modules, its the exception not the rule.

Ken Barber

unread,
Apr 1, 2014, 11:52:24 AM4/1/14
to puppe...@googlegroups.com
>> >> This seems a bit backwards to me, for all other parts of the query you
>> >> just leave it out if you don't want to match on it. There's no need for
>> >> a
>> >> explicit tags=='*' if you want to match on all tags for example. So I
>> >> don't
>> >> see why environment matching would work the opposite way.
>> >>
>> >> So I'm proposing instead that you add environment==$::environment to
>> >> your
>> >> query if you want to collect only from your current environment.
>>
>> How would you do this with 3rd party modules though? Modify them?
>
>
> Well, they can accept a collect_environment parameter that could default to
> '*' or so and use that in the query.

Isn't that the opposite of what you've been asking for as the default,
do you mean "environment = $::environment" here or have you changed
your mind about the default?

ken.

Erik Dalén

unread,
Apr 1, 2014, 12:02:43 PM4/1/14
to Puppet Developers
I think the default when you leave out a parameter from a query should be to not match on that parameter, like it is today. And that environment shouldn't be different than any other parameter you can match on in this regard.

If a third party module wants to match on a environment and wants the user to be able to specify it, it can have a collect_environment parameter and add something like environment==$collect_environment to its collection query. If the default of the collect_environment parameter is '*' or $::environment is really up to the module, but perhaps there can be a best practise around that.

Ken Barber

unread,
Apr 1, 2014, 12:09:23 PM4/1/14
to puppe...@googlegroups.com
>> Isn't that the opposite of what you've been asking for as the default,
>> do you mean "environment = $::environment" here or have you changed
>> your mind about the default?
>>
>
> I think the default when you leave out a parameter from a query should be to
> not match on that parameter, like it is today. And that environment
> shouldn't be different than any other parameter you can match on in this
> regard.

Are there any others that would say this is their preferred default, if yes why?

ken.

Deepak Giridharagopal

unread,
Apr 1, 2014, 1:44:00 PM4/1/14
to puppe...@googlegroups.com
So, for science, Ken and I just looked at how the old,
ActiveRecord-based storeconfigs worked with respect to
environments. We found a few things of note:

* it collects across all environments, with no way to override

* there is an "environment" field in the db schema, but it's only
attached to "host" objects

* even so, it doesn't appear that this field actually captures the
puppet client's environment anyways

For the curious, here's the code that would translate a collection
query into an activerecord query:

https://github.com/puppetlabs/puppet/blob/master/lib/puppet/indirector/resource/active_record.rb#L34

Notice that the code only supports comparisons against title, tags,
and parameter names/values. This was surprising to me, but the code's
the code. :)

As we've been collecting from all environments all along, my
inclination is to continue to work that way so as to preserve
compatibility and generally not surprise people.

I quite like the idea of allowing people to restrict collection based
on environment. That requires a slight tweak to the puppetdb terminus
code, but I don't think it'll be too bad. Erik is correct, though,
that we can't really use "environment" as the search term there
because there are some resources that use "environment" as a parameter
name (like "exec"). :/

--
deepak / puppet labs / engineering

John Bollinger

unread,
Apr 1, 2014, 3:35:54 PM4/1/14
to puppe...@googlegroups.com


Yes, that's what I was suggesting in my earlier comments.  That, and to achieve restriction by environment only via a selection predicate; i.e. no way to configure a different default.  That should provide a smooth transition because the behavior of existing collections relative to multiple environments will not change.  Also, it fits my mental model better to limit results by adding filters, as opposed to expanding results by removing filters.

And it is not clear to me in general that limiting searches by environment is necessarily the behavior that is expected or most desired.  I couldn't have told you what the previous behavior was in that regard, but now that it's been described, I don't find it at all surprising.


John

Ken Barber

unread,
Apr 2, 2014, 10:40:59 AM4/2/14
to puppe...@googlegroups.com
> I quite like the idea of allowing people to restrict collection based
> on environment. That requires a slight tweak to the puppetdb terminus
> code, but I don't think it'll be too bad. Erik is correct, though,
> that we can't really use "environment" as the search term there
> because there are some resources that use "environment" as a parameter
> name (like "exec"). :/

Does anyone have any suggestions for avoiding this conflict?

ken.

Erik Dalén

unread,
Apr 2, 2014, 11:14:17 AM4/2/14
to Puppet Developers
As I mentioned, in puppetdbquery I've been thinking about using properties prefixed with a dot, so .environment for this and just environment for the exec parameter. Not sure how well this would work in Puppet itself though, but maybe something similar could be done.

But It would be good if both query methods used the same notation to make things simple. So I'm up for ideas regarding puppetdbquery as well.

--
Erik Dalén

Daniele Sluijters

unread,
Apr 3, 2014, 5:49:39 AM4/3/14
to puppe...@googlegroups.com
I always found it surprising and slightly confusing that exec has an 'environment' attribute, I expected that to be a metaparam that could be applied to any resource to indicate what environment it should belong to (defaulting to the current one). Would get freaky as you could then export resources across environment but there's cases where that could be useful.

How is this discrepancy currently handled in the Nagios types? I know that those types have attributes that clash with metaparams for example, like alias.

Ken Barber

unread,
Apr 7, 2014, 11:04:15 AM4/7/14
to puppe...@googlegroups.com
Hmm. Lots of things are possible, just need to avoid collision with
the parameter naming.

Myresource <<| .environment == $$::environment |>> #
dalen's suggestion
Myresource <<| _environment == $$::environment |>> #
alternate to dalen's suggestion
Myresource <<| catalog.environment == $$::environment |>> # implies
that 'catalog' is an object with subparameters
Myresource <<| Environment == $$::environment |>> # ye old
capitalization like other parts of puppet
Myresource <<| same_environment? |>> #
short-hand for matching the same environment

ken.
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to puppet-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CAAAzDLcK7C8czPd0u3t7tAXQvKGOOdByrWQhQ44XA7g30dHQew%40mail.gmail.com.

Trevor Vaughan

unread,
Apr 7, 2014, 11:36:31 AM4/7/14
to puppe...@googlegroups.com
I like this one:


Myresource <<| Environment == $$::environment |>>             # ye old
capitalization like other parts of puppet

It's easy to read and makes me think "a think that's defined somewhere else".

Trevor



For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

Henrik Lindberg

unread,
Apr 7, 2014, 9:05:23 PM4/7/14
to puppe...@googlegroups.com
On 2014-07-04 17:04, Ken Barber wrote:
> Hmm. Lots of things are possible, just need to avoid collision with
> the parameter naming.
>
> Myresource <<| .environment == $$::environment |>> #
> dalen's suggestion

Nah, that goes down the path of using different syntax and even
terminals in queries.

> Myresource <<| _environment == $$::environment |>> #
> alternate to dalen's suggestion

Nah, a NAME, cannot start with _

> Myresource <<| catalog.environment == $$::environment |>> # implies
> that 'catalog' is an object with subparameters

Better, still has $$.

> Myresource <<| Environment == $$::environment |>> # ye old
> capitalization like other parts of puppet

Environment would by general rule be a type, and typically would mean
any Environment. Confusing.

> Myresource <<| same_environment? |>> #
> short-hand for matching the same environment

Introduces a name that ends with ?, not supported.

Of the proposals above, the one that introduces catalog.environment is
closest to what is supportable.

If a query of

Resource <<| Q |>>

means resolve against all environments, then

Environment[production, Myresource] <<| Q | >>

could mean, resolve the query in the given environment.

If we like to have resolution of variables that only exist in the query
that is also an option, like the reference to catalog, but I would like
that to be $catalog. i.e. inside the spaceship, when a query is
evaluated it can have access to things that we would like a user to be
able to reference and access details. The details should be obtained
using the [] operator since this is what is used for this purpose
everywhere else.

Other options are to use function calls inside the spaceship. They are
not supported now as valid in a query and would be unambiguous and easy
to transform into whatever we want them to mean. These are query
functions, not general puppet functions.

e.g.

# a given environment
Myresource <<| environment(production) |>>

# current environment
Myresource <<| environment($environment) |>>
Myresource <<| environment() |>>
Myresource <<| environment(default) |>>


Remember that the 3x parser has a very brittle grammar, and we do not
want to introduce something that will clash with the future parser
grammar. We especially do not want to mess with terminals.

- henrik

Nan Liu

unread,
Apr 7, 2014, 10:57:58 PM4/7/14
to puppet-dev
On Mon, Apr 7, 2014 at 8:04 AM, Ken Barber <k...@puppetlabs.com> wrote:
Hmm. Lots of things are possible, just need to avoid collision with
the parameter naming.

Myresource <<| .environment == $$::environment |>>            #
dalen's suggestion
Myresource <<| _environment == $$::environment |>>           #
alternate to dalen's suggestion
Myresource <<| catalog.environment == $$::environment |>>  # implies
that 'catalog' is an object with subparameters
Myresource <<| Environment == $$::environment |>>             # ye old
capitalization like other parts of puppet
Myresource <<| same_environment? |>>                               #
short-hand for matching the same environment

Could we use the current $setting::environment? Also does agent v.s. master environment throw a wrench in this discussion?

Nan 

Ken Barber

unread,
Apr 10, 2014, 5:19:30 PM4/10/14
to puppe...@googlegroups.com
>> Hmm. Lots of things are possible, just need to avoid collision with
>> the parameter naming.
>>
>> Myresource <<| .environment == $::environment |>> #
>> dalen's suggestion
>
>
> Nah, that goes down the path of using different syntax and even terminals in
> queries.
>
>
>> Myresource <<| _environment == $::environment |>> #
>> alternate to dalen's suggestion
>
>
> Nah, a NAME, cannot start with _

Actually, the parser breaks on all of these suggestion:
https://gist.github.com/kbarber/10423111

>
>> Myresource <<| catalog.environment == $$::environment |>> # implies
>> that 'catalog' is an object with subparameters
>
>
> Better, still has $$.

Yeah that was a typo I had just copy/pasted.

>> Myresource <<| Environment == $$::environment |>> # ye old
>> capitalization like other parts of puppet
>
> Environment would by general rule be a type, and typically would mean
> any Environment. Confusing.
>
>
>> Myresource <<| same_environment? |>> #
>> short-hand for matching the same environment
>
>
> Introduces a name that ends with ?, not supported.

Well, whatever we use, the idea of being a single item for matching
current environment. Alas I doubt the parser would allow it.

> Of the proposals above, the one that introduces catalog.environment is
> closest to what is supportable.

Alas, it is not. Currently it seems that anything that we could
choose, would have the chance of collision with a real parameter.
So none of the ideas I had above will actually work in the current or
future parser anyway.

I figure:

* We could use something like catalog_environment overriding the
ability to collect on a real catalog_environment parameter in the slim
chance of a collision.
* Fix the language to support something going forward, future parser style.

The latter will take some time, and probably involve you directly
Henrik for the design.

ken.

Henrik Lindberg

unread,
Apr 10, 2014, 7:00:11 PM4/10/14
to puppe...@googlegroups.com
true.

The 3x/current parser is very picky what it allows in the query. The
only chance of doing someting special, is to reserve some particular
expressions that would otherwise be interpreted as a regular query -
i.e. checking equality on a virtual parameter name or something like
that. This is both messy, and error prone, and not probably not
easier to implement than the alternatives (i.e. change the 3x grammar if
this is to appear before Puppet 4).

If considering changing the 3x grammar to accept
Environment[name, Resource] as its LHS seems quite doable, if the rest
can remain the same. This would be compatible with the future parser
language wise. (In 4x we just have to do whatever is needed in the
transformation to the query API).

The other suggestion (a "function call") is probably the same amount of
work in 3x, and also easy to deal with in 4x.

> I figure:
>
> * We could use something like catalog_environment overriding the
> ability to collect on a real catalog_environment parameter in the slim
> chance of a collision.
> * Fix the language to support something going forward, future parser style.
>
> The latter will take some time, and probably involve you directly
> Henrik for the design.
>
I will be glad to help with that.
With the future parser there is a lot more leeway - this because it is
the validation steps that forbids constructs that are illegal
semantically in the Puppet Language, but still parses (following the
principle, "I hear what you are saying, and you are wrong because", as
opposed to going "na na na na, can't hear you"... anyway...

Internally syntax wise it is something like:

CollectionExpression
: Expression Query ('{' ... '}')?
;

Query
: '<|' Expression? '|>'
| '<||' Expression? '||>'
;


i.e. there is a lot of flexibility when parsing. The query operators
have higher precedence than most operators, but you can always put an
expression in parentheses - e.g.

(1+2) <| query |>

Even if that particular example is meaningless, a function call is not
(one that returns the type to query), or a variable expression - e.g.

$x = Environment[production, File]
$x <| ... |>

Note that the query is just one Expression, i.e. parser wise it could be:
a + b * 3 == 4 and foo(1,2,3) + bar or true != false and blah(blah)

The validator kicks in after parsing and raises errors for anything that
is illegal. The validator has one particular way to treat the expression
allowed in a query. Thus to experiment, you would need a tweaked
validator. (Not hard to do when playing with it/ trying things out)

If you want to go out on a limb - calls supports lambdas, and those can
be transformed to something else (think clojure...) that way you could
send a function to execute on the server. Validation can ensure that it
only contains supported expressions....

I think there is a lot that can be done wrt supporting queries expressed
in the Puppet Language.

Regards
- henrik

Ken Barber

unread,
Apr 10, 2014, 7:44:36 PM4/10/14
to puppe...@googlegroups.com
> The 3x/current parser is very picky what it allows in the query. The only
> chance of doing someting special, is to reserve some particular expressions
> that would otherwise be interpreted as a regular query - i.e. checking
> equality on a virtual parameter name or something like that. This is both
> messy, and error prone, and not probably not
> easier to implement than the alternatives (i.e. change the 3x grammar if
> this is to appear before Puppet 4).

Okay.

> If considering changing the 3x grammar to accept
> Environment[name, Resource] as its LHS seems quite doable, if the rest can
> remain the same. This would be compatible with the future parser language
> wise. (In 4x we just have to do whatever is needed in the transformation to
> the query API).
>
> The other suggestion (a "function call") is probably the same amount of work
> in 3x, and also easy to deal with in 4x.

I wouldn't want to sacrifice a nice grammar for getting anything
rushed in too much. Especially if we see a greater capability down the
road for a more complete query capability.

>> I figure:
>>
>> * We could use something like catalog_environment overriding the
>> ability to collect on a real catalog_environment parameter in the slim
>> chance of a collision.
>> * Fix the language to support something going forward, future parser
>> style.
>>
>> The latter will take some time, and probably involve you directly
>> Henrik for the design.
>>
> I will be glad to help with that.
> With the future parser there is a lot more leeway - this because it is the
> validation steps that forbids constructs that are illegal semantically in
> the Puppet Language, but still parses (following the principle, "I hear what
> you are saying, and you are wrong because", as opposed to going "na na na
> na, can't hear you"... anyway...

So much nicer, doing training I had an eye for picking up bugs in the
code but it was always by feel more than just 'reading the error
message'.
So here is the current bit of interesting. When you provide a
constraint in a naive way:

Notify <<| catalog_environment == $::environment |>>

It more or less works to a certain extent, however my first example
failed. And you'll laugh when you hear why.

When you export and collect on the same node, its up to Puppet to do
the collection internally. This is so it can do lazy evaluation
presumably, and its also a chicken and egg problem ... it needs all
the collected resources before it can create a full catalog. So in the
case of a node exporting and collecting its own resources, it doesn't
know about catalog_resources so it just doesn't match.

Now we get the query via the the PuppetDB terminus, but we always
specifically stop the collection for itself:

["and",
["=", "type", "Notify"],
["=", "exported", true],
["not", ["=", "certname", "kb.local"]],
["=", "environment", "production"]]

This is to avoid the duplication problem, not to mention you don't
want to collect the resources from the _last_ catalog, it isn't up to
date.

What this means is, while this solution would work for collection from
other nodes, it fails to work when you are export and collecting on
the same node. So at the moment, even something naive like
catalog_environment == $::environment doesn't work without a Puppet
change most probably.

ken.

Ken Barber

unread,
Apr 11, 2014, 11:59:34 AM4/11/14
to puppe...@googlegroups.com
So for now our status means trying to do this in the language without
an actual change to Puppet is becoming hard. This is entirely
possible, but we'll have to ship with environment support without
constraint capability today most probably.

The only other 'quick and dirty' option I can think of is to do this
back in the terminus configuration again, which some people are
clearly not fans of.

Any other ideas from those watching at home?

ken.

Henrik Lindberg

unread,
Apr 11, 2014, 12:09:29 PM4/11/14
to puppe...@googlegroups.com
On 2014-11-04 17:59, Ken Barber wrote:
> So for now our status means trying to do this in the language without
> an actual change to Puppet is becoming hard. This is entirely
> possible, but we'll have to ship with environment support without
> constraint capability today most probably.
>
> The only other 'quick and dirty' option I can think of is to do this
> back in the terminus configuration again, which some people are
> clearly not fans of.
>
> Any other ideas from those watching at home?
>
Very ugly solution: monkey patch collexpr.rb in the puppet db module.
(This is how the experimental hiera support for data in modules installs
itself. IIRC, a terminii initiates itself early).

Yes, it is horribly ugly.

- henrik


Ken Barber

unread,
Apr 11, 2014, 12:12:06 PM4/11/14
to puppe...@googlegroups.com
Lol. Yeah I had thought of that idea, but wasn't fond of it so didn't
mention it. Its a reasonable anti-social way of doing it :-).

ken.

Ken Barber

unread,
Apr 11, 2014, 12:24:34 PM4/11/14
to puppe...@googlegroups.com
I've created a fairly open ticket here for Puppet itself to cover the
changes in Puppet at least for now. That way we can track work for a
more 'final solution' even if we do find something temporary to do:

https://tickets.puppetlabs.com/browse/PUP-2217

ken.
Reply all
Reply to author
Forward
0 new messages