Puppet environments implementation question - get_conf() location and result caching

37 views
Skip to first unread message

Bostjan Skufca

unread,
Apr 15, 2015, 10:30:03 PM4/15/15
to puppe...@googlegroups.com
Just recently I was looking at the environments part of puppet implementation, and I have stumbled upon this curious gimmick.

There is Puppet::Environments class, which is a class for general Environments management (searching, loaders, caching, etc).
Then there is Puppet::Node::Environment which is instantiated for every environent found. Logical up to this point.

But, contrary to all OOP notions, get_conf() function is not a method of Puppet::Node::Environment class.
Rather, it is a lookup function in Puppet::Environments class which takes environment name as an argument and fetches Puppet::Settings::EnvironmentConf instance.

Puppet::Environments has two function for environment and environment data retrieval:
- get() fetches Puppet::Node::Environment instance
- get_conf() fetches Puppet::Settings::EnvironmentConf instance

The curious thing is:
- get() method operates with cache
- get_conf() does not include any caching

I tested this on a catalog of 1633 resources, and:
- get() was called around 220 times
- get_conf() was called around 25.000 times

These are the comments above get_conf() in environments.rb:
    # This implementation evicts the cache, and always gets the current
    # configuration of the environment
    #
    # TODO: While this is wasteful since it
    # needs to go on a search for the conf, it is too disruptive to optimize
    # this.

My questions are:
- why does get_conf() method belong to Environments instead of Node::Environment class?
- why are get_conf() results not cached? Why is it "too disruptive"?

Thanks for answers,
b.

Henrik Lindberg

unread,
Apr 16, 2015, 8:01:29 AM4/16/15
to puppe...@googlegroups.com
The path to the new environments implementation (supporting both
directory environments, and the now deprecated (and in 4.0 removed)
dynamic environments) was paved with problems as it affected more or
less the entire code base.

It should be fine to cache the env conf in an instance of an
environment, and it should be possible to obtain the config given
an instantiated environment. This is safe since the conf will then be
evicted when the environment is evicted (and while the environment is
cached it will not reload or change).

I believe that the large number of calls
to get the conf could almost exclusively be for an already loaded
environment (I should be measured though).
All other calls would be for information purposes and those cannot be
cached unless the environment.conf file is watched (which is difficult
since it may not exist, come into existence, be removed etc.). For those
calls, the cost of checking if the file is up to date is almost as
expensive as loading the conf.

Since the 3.x implementation is very complex due to the dynamic
environments, I suggest that this should be fixed for 4.x.

- henrik

> Thanks for answers,
> b.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to puppet-dev+...@googlegroups.com
> <mailto:puppet-dev+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/53b560d5-1c8f-4b00-944c-3156343a175f%40googlegroups.com
> <https://groups.google.com/d/msgid/puppet-dev/53b560d5-1c8f-4b00-944c-3156343a175f%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

Bostjan Skufca

unread,
Apr 16, 2015, 12:34:31 PM4/16/15
to puppe...@googlegroups.com
It should be fine to cache the env conf in an instance of an
environment, and it should be possible to obtain the config given
an instantiated environment. This is safe since the conf will then be
evicted when the environment is evicted (and while the environment is
cached it will not reload or change).

It would be more logical that way too, for the user. If there is "environment timeout", user would expect that all parts of environment (env configuration, code, even manifests) are cached. At least that was my understanding until I actually looked at the code.

Shall I create a Jira ticket and pull request with working code change?

 
I believe that the large number of calls
to get the conf could almost exclusively be for an already loaded
environment (I should be measured though).

Yes, this is true. All the calls are for the environment where the node/agent resides. I tested this with single node only, though.

 
All other calls would be for information purposes and those cannot be
cached unless the environment.conf file is watched (which is difficult
since it may not exist, come into existence, be removed etc.). For those
calls, the cost of checking if the file is up to date is almost as
expensive as loading the conf.

"All other calls" - do you mean calls for other environments?
If so, as said above, I did not notice this on my setup (could not). But I believe that if env caching follows the same pattern for all environments, it should not be a problem.


Since the 3.x implementation is very complex due to the dynamic
environments, I suggest that this should be fixed for 4.x.

This is probably for the best, true. If one wants to refrain from using dynamic environments and wishes to use 3.7.x, a separate patch may be provided.


b.

Henrik Lindberg

unread,
Apr 17, 2015, 9:04:08 AM4/17/15
to puppe...@googlegroups.com
Yes please. A ticket for fix-version PUP 4.x (we then update it with the
actual release it will go into). I don't think it is worth the trouble
of fixing this on 3.x. Target the code to the master branch.

> I believe that the large number of calls
> to get the conf could almost exclusively be for an already loaded
> environment (I should be measured though).
>
>
> Yes, this is true. All the calls are for the environment where the
> node/agent resides. I tested this with single node only, though.
>
Good to know. Then this is worth doing.

> All other calls would be for information purposes and those cannot be
> cached unless the environment.conf file is watched (which is difficult
> since it may not exist, come into existence, be removed etc.). For
> those
> calls, the cost of checking if the file is up to date is almost as
> expensive as loading the conf.
>
>
> "All other calls" - do you mean calls for other environments?
> If so, as said above, I did not notice this on my setup (could not). But
> I believe that if env caching follows the same pattern for all
> environments, it should not be a problem.
>
I do mean calls for other environments than the one(s) currently loaded
and in use/cached by the master. This happens for instance when
commands/services query for available environments, asks which classes
that are in an environment etc. These calls when made for environments
that are loaded, should read the conf directly (they are almost never
interested in loading the environment they want information about).

>
> Since the 3.x implementation is very complex due to the dynamic
> environments, I suggest that this should be fixed for 4.x.
>
>
> This is probably for the best, true. If one wants to refrain from using
> dynamic environments and wishes to use 3.7.x, a separate patch may be
> provided.
>
Cheers, and thanks for taking a stab at improving this!

- henrik
Reply all
Reply to author
Forward
0 new messages