Environment Caching - RFC

121 views
Skip to first unread message

Henrik Lindberg

unread,
Apr 21, 2014, 5:29:23 PM4/21/14
to puppe...@googlegroups.com
Hi,
We have been looking into environment caching and have some thoughts and
ideas about how this can be done. Love to get your input on those ideas,
and your thoughts about their usefulness.

There is a google document that has the long story - it is open for
commenting. It is not required reading as the essence is outlined here.
The doc is here:
https://docs.google.com/a/puppetlabs.com/document/d/1G-4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGfJu31pAHxS7g/edit?disco=AAAAAGtMYOI#heading=h.rpgaxghcfqol

The current state of caching environments
---
A legacy environment caches the result or parsing manifests and loading
functions / types, and reacts to changed files. It does this by
recording the mtime of each file as it is parsed / read. Later, if the
same file would be parsed again, it will use the already cached produced
result. If the file is stale, the entire cache is cleared and it starts
from scratch.

It does not however react to added files. It also does not recognize
changes in files evaluated as a consequence of evaluating ruby logic
(i.e. if a function, type, etc. required something, that is not recorded).

The new directory based environments does not support caching. (And now
we want to address this).

The problem with caching
---
The problem with caching is that it can be quite costly to compute and
we found that different scenarios benefits from different caching
strategies.

In an environment where the ratio of modules/manifests present in the
environment vs. the number actually used per individual node is low
checking caching can be slower than starting with a clean slate every time.

Proposed Strategies
---
We think there is a core set of strategies that a user should be able to
select. These should cover the typical usage scenarios.

* NONE - no caching, each catalog product starts with a clean slate.
This is the current state of directory based environments, and it
could also be made to apply to legacy environments. This is good in
a very dynamic environment / development or low "signal/noise" ratio.

* REBOOT - (the opposite of NONE) - cache everything, never check for
changes. A reboot of the master is required for it to react to
changes.
This is good for a static configuration, and where the organization
always takes down the master for other reasons when there are changes.
This strategy avoids scanning, and is thus a speed improvement for
configurations with a large set of files.

* TIMEOUT - cache all environments with a 'time to live' (TTL). When a
request is made for an environment where the TTL has expired it
starts that environment with a clean slate.
This is a compromise - it will pick up all changes (even additions),
but it will take one "TTL" before they are picked up (say 5 minutes;
configurable).

These three schemes are believed to cover the different usage scenarios.
They all have the benefit that they do not require watching any files
(thereby drastically reducing the number of stat calls).

Strategy that is probably not needed:

* ENVDIRCHANGE - watches the directory that represents
the environment. Reloads if the directory itself is stale (using
filetimeout setting to cap the number of times it checks). Thus, it
will reaact to changes to the environment root only (which typically
does not happen when changing content in the environment, but is
triggered if the environments configuration file is added or removed).
To pick up any other changes, the user would need to touch the
directory.

Strategies we think are not needed:

* SCAN - like today where every file is watched.
* CONFCHANGE - watch/scan all configuration files.

Feedback ?
---
Here are a couple of questions to start with...

* What do you think of the proposed strategies?
* If you like the scanning strategy, what use cases do you see it would
benefit that the proposed strategies does not handle?
* Any other ideas?
* Any use cases you feel strongly about? Scenarios we need to consider...

Regards
- henrik

Andy Parker

unread,
Apr 21, 2014, 5:37:34 PM4/21/14
to puppe...@googlegroups.com
On Mon, Apr 21, 2014 at 2:29 PM, Henrik Lindberg <henrik....@cloudsmith.com> wrote:
Hi,
We have been looking into environment caching and have some thoughts and ideas about how this can be done. Love to get your input on those ideas, and your thoughts about their usefulness.

There is a google document that has the long story - it is open for commenting. It is not required reading as the essence is outlined here.
The doc is here: https://docs.google.com/a/puppetlabs.com/document/d/1G-4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGfJu31pAHxS7g/edit?disco=AAAAAGtMYOI#heading=h.rpgaxghcfqol

The current state of caching environments
---
A legacy environment caches the result or parsing manifests and loading functions / types, and reacts to changed files. It does this by recording the mtime of each file as it is parsed / read. Later, if the same file would be parsed again, it will use the already cached produced result. If the file is stale, the entire cache is cleared and it starts from scratch.

It does not however react to added files. It also does not recognize changes in files evaluated as a consequence of evaluating ruby logic (i.e. if a function, type, etc. required something, that is not recorded).


It will react to added files, but only after the filetimeout has expired on another file that will cause it to pick up the new file. It all gets very complicated.
 
The new directory based environments does not support caching. (And now we want to address this).

The problem with caching
---
The problem with caching is that it can be quite costly to compute and we found that different scenarios benefits from different caching strategies.

In an environment where the ratio of modules/manifests present in the environment vs. the number actually used per individual node is low checking caching can be slower than starting with a clean slate every time.


In all of the following strategies, does this also involve removing the known_resource_types WatchedFile caching system?
 

- henrik

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/lj42jj%24b59%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.



--
Andrew Parker
Freenode: zaphod42
Twitter: @aparker42
Software Developer

Join us at PuppetConf 2014September 22-24 in San Francisco
Register by May 30th to take advantage of the Early Adopter discount save $349!

Jeff Bachtel

unread,
Apr 21, 2014, 7:09:46 PM4/21/14
to puppe...@googlegroups.com
I don't understand how inotify(7)+epoll(7) watching all environment
subdirectories for changes (CONFCHANGE, but pointing at all directories
vice files) would not be more performant than SCAN.

Jeff

Henrik Lindberg

unread,
Apr 21, 2014, 7:17:08 PM4/21/14
to puppe...@googlegroups.com
On 2014-21-04 23:37, Andy Parker wrote:
> On Mon, Apr 21, 2014 at 2:29 PM, Henrik Lindberg
> <henrik....@cloudsmith.com <mailto:henrik....@cloudsmith.com>>
> wrote:
>
> Hi,
> We have been looking into environment caching and have some thoughts
> and ideas about how this can be done. Love to get your input on
> those ideas, and your thoughts about their usefulness.
>
> There is a google document that has the long story - it is open for
> commenting. It is not required reading as the essence is outlined here.
> The doc is here:
> https://docs.google.com/a/__puppetlabs.com/document/d/1G-__4Z6vi6Tv5xZtzVh7aT2zNWbOxJ3BGf__Ju31pAHxS7g/edit?disco=__AAAAAGtMYOI#heading=h.__rpgaxghcfqol
Yes, that is the idea. The NONE, and REBOOT are simple. The TIMEOUT
operates on the request for the environment only (does not need to look
at any files).

The ENVDIRCHANGE would only need to look at a single file (the env
directory).

The strategies that we think are not needed (SCAN - obviously needs
to watch, and we do not want that). CONFCHANGE would need to watch
configuration files (unclear exactly what). That is also somewhat fuzzy,
and also something it seems we do not need to support if
we have the other.

Henrik Lindberg

unread,
Apr 21, 2014, 10:02:50 PM4/21/14
to puppe...@googlegroups.com
On 2014-22-04 1:09, Jeff Bachtel wrote:
> I don't understand how inotify(7)+epoll(7) watching all environment
> subdirectories for changes (CONFCHANGE, but pointing at all directories
> vice files) would not be more performant than SCAN.
>
As far as I know, inotify and epoll are not available on all platforms
that Puppet supports, and we currently have no plans to add such support
to those platforms.

Also, while it probably is more performant than individualy watching
every file, it is not without performance penalty. There is also
substantial complexity in using inotify - see the "Limitations and
Caveats" in the documentation which states:

"Limitations and caveats

Inotify monitoring of directories is not recursive: to monitor
subdirectories under a directory, additional watches must be created.
This can take a significant amount time for large directory trees.
The inotify API provides no information about the user or process that
triggered the inotify event. In particular, there is no easy way for a
process that is monitoring events via inotify to distinguish events that
it triggers itself from those that are triggered by other processes.

Note that the event queue can overflow. In this case, events are lost.
Robust applications should handle the possibility of lost events gracefully.

The inotify API identifies affected files by filename. However, by the
time an application processes an inotify event, the filename may already
have been deleted or renamed.

If monitoring an entire directory subtree, and a new subdirectory is
created in that tree, be aware that by the time you create a watch for
the new subdirectory, new files may already have been created in the
subdirectory. Therefore, you may want to scan the contents of the
subdirectory immediately after adding the watch."

Puppet is pretty much exposed to all of those caveats - and the only
remedy is to... scan. Which is the very thing we want to avoid.

The primary problem is that there are no transaction boundaries around
changes in the file system. Puppet tries to run on top of a potentially
changing set of files - in fact bad things can happen when changing
files while puppet is compiling a catalog that depends on the changing
files. We do plan to address those issues at some point during Puppet
4x. That is, such problems prevail even if changes are computed in a
more efficient way (say using inotify) due to the asynchronous nature
of watching / notifying).

Meanwhile, we do believe that the three proposed strategies NONE,
REBOOT, and TIMEOUT are sufficient to handle the usage scenarios we have
identified, and that they will work with as much safety as the current
implementation, and that they will be more accurate and easier to
understand, and most importantly, that they do not have to be based on
scanning anything.

I hope that explains our reasoning.

Regards
- henrik

Henrik Lindberg

unread,
Apr 21, 2014, 10:06:13 PM4/21/14
to puppe...@googlegroups.com
Update - it is both reasonable and possible to support the caching
strategy per environment.

This means that environments used for experimenting can be set to not
use caching (expires immediately), and stable environments can be given
a longer time (or infinite for the "reboot" strategy).

- henrik

Thomas Hallgren

unread,
Apr 22, 2014, 2:06:26 AM4/22/14
to puppe...@googlegroups.com
Would a MANUAL strategy make sense? I.e. instead of rebooting the master, just tell it to clear the cache (perhaps per
environment).

- thomas

Henrik Lindberg

unread,
Apr 22, 2014, 8:59:19 AM4/22/14
to puppe...@googlegroups.com
On 2014-22-04 8:06, Thomas Hallgren wrote:
> Would a MANUAL strategy make sense? I.e. instead of rebooting the
> master, just tell it to clear the cache (perhaps per environment).
>
> - thomas
>

We discussed that - either as a command, or by touching a directory or
file. Maybe that is part of the commands that lets you modify the settings.

We also discussed having MANUAL as the only option, but we ruled that
out as it would be too irritating when trying things out (i.e. better to
set NONE for the testing/dev environments).

If people want a MANUAL strategy we can add one.

- henrik




Felix Frank

unread,
Apr 22, 2014, 9:04:24 AM4/22/14
to puppe...@googlegroups.com
If accidental cache clearing does not worry you, I feel that
ENVDIRCHANGE *is* pretty much MANUAL.

Felix

Trevor Vaughan

unread,
Apr 22, 2014, 9:41:18 AM4/22/14
to puppe...@googlegroups.com
The more I read all of this, the more I'm a fan of a MANUAL mode.

A file per environment $ENV_HEAD/clear_cache or something that, when touched, will cause the cache to clear for that environment.

I also like the NONE option since, as was pointed out, dev/test environments shouldn't be caching.

The others don't interest me all that much in general.

If the per environment option is used, I'd like to see an associated puppet option to go with it.

# Clear the 'production' environment cache
puppet cache clear

# Clear the 'test' environment cache
puppet cache clear --env='test'

Thanks,

Trevor


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/535668D8.4080902%40alumni.tu-berlin.de.

For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

John Bollinger

unread,
Apr 22, 2014, 4:32:04 PM4/22/14
to puppe...@googlegroups.com


On Monday, April 21, 2014 4:29:23 PM UTC-5, henrik lindberg wrote:
[...]

We think there is a core set of strategies that a user should be able to
select. These should cover the typical usage scenarios.

* NONE - no caching, each catalog product starts with a clean slate.
   This is the current state of directory based environments,


Is that why a user is reporting over on puppet-users that turning on directory environments explodes his catalog compilation times?

I don't see any reason to object to this strategy, but I'm inclined to doubt that many sites will find it useful in production.

 
and it
   could also be made to apply to legacy environments. This is good in
   a very dynamic environment / development or low "signal/noise" ratio.

* REBOOT - (the opposite of NONE) - cache everything, never check for
   changes. A reboot of the  master is required for it to react to
   changes.
   This is good for a  static configuration, and where the organization
   always takes down the master for other reasons when there are changes.
   This strategy avoids scanning, and is thus a speed improvement for
   configurations with a large set of files.



I could see the REBOOT strategy being used at very sensitive or tightly-controlled sites, but I'm inclined to think that ENVDIRCHANGE would be preferable to many people on account of the ability it affords to trigger cache invalidation without restarting the master.

 
* TIMEOUT - cache all environments with a 'time to live' (TTL). When a
   request is made for an environment where the TTL has expired it
   starts that environment with a clean slate.
   This is a compromise - it will pick up all changes (even additions),
   but it will take one "TTL" before they are picked up (say 5 minutes;
   configurable).



That one makes me very nervous.  It seems like an open invitation for manifest version shear.  I would not even consider using it, myself; I'd prefer even scanning.

 
These three schemes are believed to cover the different usage scenarios.
They all have the benefit that they do not require watching any files
(thereby drastically reducing the number of stat calls).

Strategy that is probably not needed:

* ENVDIRCHANGE - watches the directory that represents
   the environment. Reloads if the directory itself is stale (using
   filetimeout setting to cap the number of times it checks). Thus, it
   will reaact to changes to the environment root only (which typically
   does not happen when changing content in the environment, but is
   triggered if the environments configuration file is added or removed).
   To pick up any other changes, the user would need to touch the
   directory.



Perhaps it's unneeded, but that's the option I like best among those presented.  I like having a means to manually flush the cache without restarting the master(s).

 
Strategies we think are not needed:

* SCAN - like today where every file is watched.
* CONFCHANGE - watch/scan all configuration files.

Feedback ?


I'm all for moving away from the SCAN approach.  As for CONFCHANGE, is the idea basically a more clueful variant of ENVDIRCHANGE?  I could imagine that being of interest, but if you're looking to streamline initial rollout then I could see deferring it until you can document demand for it.

 
---
Here are a couple of questions to start with...

* What do you think of the proposed strategies?


See above.

 
* If you like the scanning strategy, what use cases do you see it would
benefit that the proposed strategies does not handle?


Relative to scanning, they all make it a little harder to use an approach where the master automatically pulls manifests from VCS.

 
* Any other ideas?


Can the catalog compiler be induced to abandon its progress and restart the current catalog when the cache for its environment is flushed?  That might make the TIMEOUT strategy more palatable, and it would be appropriate for some other strategies, too.

 
* Any use cases you feel strongly about? Scenarios we need to consider...



If I'm actively changing the manifest set on my master, then I know better than the master when I've done, and I favor being able to hold off on flushing the cache until then.  Also, I like being able to flush the cache of just one environment at a time, and without bringing down the master to do so.


John

Felix Frank

unread,
Apr 22, 2014, 4:39:15 PM4/22/14
to puppe...@googlegroups.com
On 04/21/2014 11:29 PM, Henrik Lindberg wrote:
>
> * Any use cases you feel strongly about? Scenarios we need to consider...

Following the ongoing discussion, I start to feel that we are really
setting ourselves up for quite some additional pleas for help on the
mailing lists. Whenever people whitness weird behavior that may be
related to caching effects (oh good old days of 0.25 masters sans
modules) we'll be asking back for their precise caching settings. If
those differ from our own, we will always nourish the slight doubt that
there may be latent bugs in whatever strategy the user is employing etc.

What I'm getting at is - do we want this kind of flexibility out there?
Or would we rather now agree about the strategy that is least awful for
the majority of use cases?

Thanks,
Felix

Andy Parker

unread,
Apr 22, 2014, 8:16:47 PM4/22/14
to puppe...@googlegroups.com
On Tue, Apr 22, 2014 at 1:32 PM, John Bollinger <john.bo...@stjude.org> wrote:
On Monday, April 21, 2014 4:29:23 PM UTC-5, henrik lindberg wrote:
[...]

We think there is a core set of strategies that a user should be able to
select. These should cover the typical usage scenarios.

* NONE - no caching, each catalog product starts with a clean slate.
   This is the current state of directory based environments,


Is that why a user is reporting over on puppet-users that turning on directory environments explodes his catalog compilation times?


Turns out that the problem there is related to another problem that we found related to how puppet parses resource references combined with how environments are loaded. Henrik will be posting a link to a PR shortly that should have a fix for that.
 
I don't see any reason to object to this strategy, but I'm inclined to doubt that many sites will find it useful in production.


Hopefully nobody tries to use this in production :) The intention of this style was for development systems when you are in development loop of: edit manifests, run agent.
 
 
and it
   could also be made to apply to legacy environments. This is good in
   a very dynamic environment / development or low "signal/noise" ratio.

* REBOOT - (the opposite of NONE) - cache everything, never check for
   changes. A reboot of the  master is required for it to react to
   changes.
   This is good for a  static configuration, and where the organization
   always takes down the master for other reasons when there are changes.
   This strategy avoids scanning, and is thus a speed improvement for
   configurations with a large set of files.



I could see the REBOOT strategy being used at very sensitive or tightly-controlled sites, but I'm inclined to think that ENVDIRCHANGE would be preferable to many people on account of the ability it affords to trigger cache invalidation without restarting the master.


Or in more performance oriented sites? It would reduce the number of stat calls that the master ends up doing. The thinking for this one was that in a production environment the master only should reread when the manifests have changed and that should be explicit. This can be done by signaling a graceful restart of the master for either passenger + apache or nginx + unicorn. For the apache setup it just takes sending a HUP to apache and for the nginx setup it takes sending a HUP to unicorn. I think that should provide a better deployment scenario for masters, but I might be wrong.
 
 
* TIMEOUT - cache all environments with a 'time to live' (TTL). When a
   request is made for an environment where the TTL has expired it
   starts that environment with a clean slate.
   This is a compromise - it will pick up all changes (even additions),
   but it will take one "TTL" before they are picked up (say 5 minutes;
   configurable).



That one makes me very nervous.  It seems like an open invitation for manifest version shear.  I would not even consider using it, myself; I'd prefer even scanning.


Yes, it does pose a bit more risk than the REBOOT strategy since it could decide to reload in the middle of a deploy, but the SCAN strategy, which is pretty much what the existing environments do can be very dangerous, or at least it was in the past.

The old strategy was to rescan during a compile if the timeout expired on *any* of the watched files. We changed this recently (3.5? I lose track of what version things go out in) so that it only reevaluates this at the *beginning* of the catalog compile run, but it still ends up scanning all of the files. The timeout would be very similar, but instead of being based on any file timestamps it would only be based on when the environment was loaded. So if the timeout has expired it just throws away and reloads the environment.
 
 
These three schemes are believed to cover the different usage scenarios.
They all have the benefit that they do not require watching any files
(thereby drastically reducing the number of stat calls).

Strategy that is probably not needed:

* ENVDIRCHANGE - watches the directory that represents
   the environment. Reloads if the directory itself is stale (using
   filetimeout setting to cap the number of times it checks). Thus, it
   will reaact to changes to the environment root only (which typically
   does not happen when changing content in the environment, but is
   triggered if the environments configuration file is added or removed).
   To pick up any other changes, the user would need to touch the
   directory.



Perhaps it's unneeded, but that's the option I like best among those presented.  I like having a means to manually flush the cache without restarting the master(s).


Wouldn't a graceful restart work just as well. I like the REBOOT + graceful restart option because it keeps the behavior of puppet much simpler and under the control of the user.
 
 
Strategies we think are not needed:

* SCAN - like today where every file is watched.
* CONFCHANGE - watch/scan all configuration files.

Feedback ?


I'm all for moving away from the SCAN approach.  As for CONFCHANGE, is the idea basically a more clueful variant of ENVDIRCHANGE?  I could imagine that being of interest, but if you're looking to streamline initial rollout then I could see deferring it until you can document demand for it.

 
---
Here are a couple of questions to start with...

* What do you think of the proposed strategies?


See above.

 
* If you like the scanning strategy, what use cases do you see it would
benefit that the proposed strategies does not handle?


Relative to scanning, they all make it a little harder to use an approach where the master automatically pulls manifests from VCS.

 
* Any other ideas?


Can the catalog compiler be induced to abandon its progress and restart the current catalog when the cache for its environment is flushed?  That might make the TIMEOUT strategy more palatable, and it would be appropriate for some other strategies, too.

 
* Any use cases you feel strongly about? Scenarios we need to consider...



If I'm actively changing the manifest set on my master, then I know better than the master when I've done, and I favor being able to hold off on flushing the cache until then.  Also, I like being able to flush the cache of just one environment at a time, and without bringing down the master to do so.


John

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Henrik Lindberg

unread,
Apr 22, 2014, 9:06:48 PM4/22/14
to puppe...@googlegroups.com
There are basically three settings that make sense for different use cases.

REBOOT - actually setting timeout to "unlimited" or a very long time.
You can force it to reload by using the graceful restart, or simply by
rebooting.

TIMEOUT - a compromise, when willing to sacrifice some cycles now and
again (every n minutes) in order to not having to do a manual graceful
shutdown.

NEVER - actually setting timeout to "0" will not cache at all. This is
good for development.

Since these cache evictions are based on time to live, and it does not
really destroy anything (it just stops holding on to it). It will only
evict the cache entry on a request to get a particular environment. It
then holds on to this current environment throughout the compilation.
The directory environment does not memoize the environments so it should
be safe to just not remember it.

The PR we are working on (and will announce shortly) also drops the
scanning behavior when an Environment is an "directory based environment".

The main problem with directory based slow down observed seems to be
caused by a bug that in the worst case could load an environment for
every resource. We noticed > 20x performance degradation in this
scenario (with the given set of manifests to load - someone else with
even more resources could see even worse numbers - or so we think).

The benchmark numbers we have now looks promising.

(Back soon with a post about the PR, and how to try it out)

Regards
- henrik

Henrik Lindberg

unread,
Apr 22, 2014, 9:19:57 PM4/22/14
to puppe...@googlegroups.com
On 2014-22-04 8:06, Thomas Hallgren wrote:
> Would a MANUAL strategy make sense? I.e. instead of rebooting the
> master, just tell it to clear the cache (perhaps per environment).
>
> - thomas
>
Circling back to this. Andy pointed out later that the best way to do
this is to get the web environment to do a graceful restart by either
sending apache or unicorn (depending on what is used) a HUP.

The problem with doing the manual cache invalidation is knowing which
running instance of the master to talk to, and it would either need an
IPC mechanism, or that all instances watch the same file - and then we
are back at the complex behavior we want to avoid...

- henrik

John Bollinger

unread,
Apr 23, 2014, 3:31:54 PM4/23/14
to puppe...@googlegroups.com


On Tuesday, April 22, 2014 8:19:57 PM UTC-5, henrik lindberg wrote:
The problem with doing the manual cache invalidation is knowing which
running instance of the master to talk to


Why would you want to talk only to one?  I can't think of a single reason.  If you want to force a cache flush then want to do it for all instances of the master.

 
, and it would either need an
IPC mechanism, or that all instances watch the same file - and then we
are back at the complex behavior we want to avoid...



Well that's one of the advantages of ENVDIRCHANGE.  All the instances watch the environment path directories for changes -- done.  That's the directories themselves, not necessarily their contents.  The whole cache goes stale, for each master instance, if the mtime of one of the environmentpath directories changes.  I don't see how that yields anything nearly as complicated or quirky as the cache management approach available now.

If you wanted to provide a bit richer cache management feature set then the master could also watch the individual environment directories (again, the directories themselves) within the environment base directory.  That could allow each environment's cache to be flushed independently.

A restart of the rack server takes time, during which Puppet would be unavailable.  On a site afflicted with long catalog compilation times, or one where that master serves up large files, the restart could consume enough time to be a problem.  Also, on a server that enforces fine-grained mandatory access controls it could be a much bigger deal to restart Puppet than just to touch a particular file.


John



John

Henrik Lindberg

unread,
Apr 23, 2014, 4:06:44 PM4/23/14
to puppe...@googlegroups.com
On 2014-23-04 21:31, John Bollinger wrote:
>
>
> On Tuesday, April 22, 2014 8:19:57 PM UTC-5, henrik lindberg wrote:
>
> The problem with doing the manual cache invalidation is knowing which
> running instance of the master to talk to
>
>
>
> Why would you want to talk only to one? I can't think of a single
> reason. If you want to force a cache flush then want to do it for /all/
> instances of the master.
>
> , and it would either need an
> IPC mechanism, or that all instances watch the same file - and then we
> are back at the complex behavior we want to avoid...
>
>
>
> Well that's one of the advantages of ENVDIRCHANGE. All the instances
> watch the environment path directories for changes -- done. That's the
> directories themselves, not necessarily their contents. The whole cache
> goes stale, for each master instance, if the mtime of one of the
> environmentpath directories changes. I don't see how that yields
> anything nearly as complicated or quirky as the cache management
> approach available now.
>
> If you wanted to provide a bit richer cache management feature set then
> the master could also watch the individual environment directories
> (again, the directories themselves) within the environment base
> directory. That could allow each environment's cache to be flushed
> independently.
>
This is actually more performant than checking all directories that can
hold environments since there is only a single stat per environment.

> A restart of the rack server takes time, during which Puppet would be
> unavailable. On a site afflicted with long catalog compilation times,
> or one where that master serves up large files, the restart could
> consume enough time to be a problem. Also, on a server that enforces
> fine-grained mandatory access controls it could be a much bigger deal to
> restart Puppet than just to touch a particular file.
>
>
If I understood it correctly, there is a graceful restart. Apache does
it very nicely by letting current workers finish their request, while
all dormant workers are replaced with a new generation. If I understand
that correctly, the server is never unavailable. We also looked at
unicorn/nginx which seems to do the same thing.

We decided to not implement any file/directory watching for 3.6 (hard
deadline in a couple of days). If the graceful restart proves a workable
solution we think that may be adequate but we need to people to try that
first. (Will reconsider if that turns out to not work, be really slow,
etc.).

For 3.6. (in the PR now available) we support timeout based cache
eviction that can be controlled per environment. The default is set to
5 seconds which is a compromise for small / out of the box use and
development.

- henrik


John Bollinger

unread,
Apr 23, 2014, 4:12:10 PM4/23/14
to puppe...@googlegroups.com


On Tuesday, April 22, 2014 7:16:47 PM UTC-5, Andy Parker wrote:
On Tue, Apr 22, 2014 at 1:32 PM, John Bollinger <john.bo...@stjude.org> wrote:

I could see the REBOOT strategy being used at very sensitive or tightly-controlled sites, but I'm inclined to think that ENVDIRCHANGE would be preferable to many people on account of the ability it affords to trigger cache invalidation without restarting the master.


Or in more performance oriented sites? It would reduce the number of stat calls that the master ends up doing. The thinking for this one was that in a production environment the master only should reread when the manifests have changed and that should be explicit. This can be done by signaling a graceful restart of the master for either passenger + apache or nginx + unicorn. For the apache setup it just takes sending a HUP to apache and for the nginx setup it takes sending a HUP to unicorn. I think that should provide a better deployment scenario for masters, but I might be wrong.


I think that option will be very attractive to some.  I do think it wrong, though, to characterize that alternative as "better" in an absolute sense.  The relative merits of the different alternatives depend to some extent on site-specific requirements, policy, and characteristics.


* ENVDIRCHANGE - watches the directory that represents
   the environment. Reloads if the directory itself is stale (using
   filetimeout setting to cap the number of times it checks). [...]



Perhaps it's unneeded, but that's the option I like best among those presented.  I like having a means to manually flush the cache without restarting the master(s).


Wouldn't a graceful restart work just as well. I like the REBOOT + graceful restart option because it keeps the behavior of puppet much simpler and under the control of the user.


Whether a graceful restart would work as well depends on the criteria by which you judge.  A restart must involve a service interruption, and that could be significant in some cases (either in the sense of "important" or in the sense of "long").  I'm thinking it might also cause some 'source'd File resources to fail needlessly for catalogs that have recently been served.

Moreover, those and any other such issues would be visited on each environment as governed by the combined needs of all environments.  For example, if the master hosts production and development environments, then every reboot required on account of changes to the dev environment would produce a service interruption for the production environment (too).  That runs a bit counter to the whole purpose of environments.

On the other hand, flushing the cache of a running instance does not need to involve any service interruption.  Furthermore, inasmuch as the user can force a flush by touching a directory -- and in most cases that would in fact be needed to flush without rebooting -- this approach puts the behavior almost as much under direct user control as the REBOOT option does.


John

Brice Figureau

unread,
Apr 28, 2014, 7:51:25 AM4/28/14
to puppe...@googlegroups.com
On Wed, 2014-04-23 at 03:19 +0200, Henrik Lindberg wrote:
> On 2014-22-04 8:06, Thomas Hallgren wrote:
> > Would a MANUAL strategy make sense? I.e. instead of rebooting the
> > master, just tell it to clear the cache (perhaps per environment).
> >
> > - thomas
> >
> Circling back to this. Andy pointed out later that the best way to do
> this is to get the web environment to do a graceful restart by either
> sending apache or unicorn (depending on what is used) a HUP.

That should work for Apache or Nginx.

But what about the new users running a master on webrick?

Those are the users that we want to protect against:
* stale cache and endless issues to understand why the master doesn't
pick up the manifests changes
* bad performance if there's no caching at all. People are prompt to
make opinions (especially when something doesn't work at first).

But maybe, we just don't care about webrick (I don't remember if the
webrick support will abandoned or not or is it already)?

> The problem with doing the manual cache invalidation is knowing which
> running instance of the master to talk to, and it would either need an
> IPC mechanism, or that all instances watch the same file - and then we
> are back at the complex behavior we want to avoid...

Well, having at most a dozen processes watching one given file shouldn't
be as harmfull than having a dozen processes watching a ton of manifest
files...

--
Brice Figureau
My Blog: http://www.masterzen.fr/

Andy Parker

unread,
Apr 28, 2014, 1:54:32 PM4/28/14
to puppe...@googlegroups.com
On Mon, Apr 28, 2014 at 4:51 AM, Brice Figureau <brice-...@daysofwonder.com> wrote:
On Wed, 2014-04-23 at 03:19 +0200, Henrik Lindberg wrote:
> On 2014-22-04 8:06, Thomas Hallgren wrote:
> > Would a MANUAL strategy make sense? I.e. instead of rebooting the
> > master, just tell it to clear the cache (perhaps per environment).
> >
> > - thomas
> >
> Circling back to this. Andy pointed out later that the best way to do
> this is to get the web environment to do a graceful restart by either
> sending apache or unicorn (depending on what is used) a HUP.

That should work for Apache or Nginx.

But what about the new users running a master on webrick?


I think setting the default timeout value to something short should handle these use cases. We are thinking of setting the default to around 15 seconds should be fine. In fact this is the default for the current system to rescan the files.
 
Those are the users that we want to protect against:
 * stale cache and endless issues to understand why the master doesn't
pick up the manifests changes
 * bad performance if there's no caching at all. People are prompt to
make opinions (especially when something doesn't work at first).


Absolutely. These are exactly the concerns that I've had as we went over how to handle the cache invalidation.
 
But maybe, we just don't care about webrick (I don't remember if the
webrick support will abandoned or not or is it already)?


It is still around, but I don't consider it anything more than a quick and dirty way to stand up a master for testing and development (either on puppet or of puppet manifests). I would love to get rid of it, but there isn't a simple option to take its place right now.
 
> The problem with doing the manual cache invalidation is knowing which
> running instance of the master to talk to, and it would either need an
> IPC mechanism, or that all instances watch the same file - and then we
> are back at the complex behavior we want to avoid...

Well, having at most a dozen processes watching one given file shouldn't
be as harmfull than having a dozen processes watching a ton of manifest
files...


OTOH, for a new user who is just starting to try things out, having it watch a single file (probably the environment directory) isn't going to be obvious. How are they do know that after editing a manifest they also need to go and touch a directory. So, while the directory watching could give a mechanism for triggering a cache invalidation for deployment scenarios, I don't think it is something that will help new users.
 
--
Brice Figureau
My Blog: http://www.masterzen.fr/
--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Trevor Vaughan

unread,
Apr 28, 2014, 2:55:02 PM4/28/14
to puppe...@googlegroups.com
"OTOH, for a new user who is just starting to try things out, having it watch a single file (probably the environment directory) isn't going to be obvious. How are they do know that after editing a manifest they also need to go and touch a directory. So, while the directory watching could give a mechanism for triggering a cache invalidation for deployment scenarios, I don't think it is something that will help new users."

This is why I wanted a puppet face to do it.

puppet env refresh 'foobar' or the like

Also:

puppet env check -> Return the list of environments that are 'dirty'

Trevor



For more options, visit https://groups.google.com/d/optout.



--

Daniele Sluijters

unread,
Apr 29, 2014, 3:35:33 AM4/29/14
to puppe...@googlegroups.com
I'm with Trevor here. A Face to interact with environments and query for state would be really helpful to debug such issues or force a refresh.

Robin Bowes

unread,
Apr 29, 2014, 2:52:46 PM4/29/14
to puppe...@googlegroups.com
On Tue, 2014-04-29 at 00:35 -0700, Daniele Sluijters wrote:
> I'm with Trevor here. A Face to interact with environments and query
> for state would be really helpful to debug such issues or force a
> refresh.

O/T: Are there any docs/guides around writing puppet faces?

Thanks,

R.


Joachim Thuau

unread,
May 1, 2014, 2:24:37 PM5/1/14
to puppe...@googlegroups.com


On Tuesday, April 22, 2014 6:41:18 AM UTC-7, Trevor Vaughan wrote:
The more I read all of this, the more I'm a fan of a MANUAL mode.

A file per environment $ENV_HEAD/clear_cache or something that, when touched, will cause the cache to clear for that environment.

I also like the NONE option since, as was pointed out, dev/test environments shouldn't be caching.

The others don't interest me all that much in general.

If the per environment option is used, I'd like to see an associated puppet option to go with it.

# Clear the 'production' environment cache
puppet cache clear

# Clear the 'test' environment cache
puppet cache clear --env='test'



something like that could also be used with the source control system when something gets deployed to refreshed the environment once things are "updated" (svn up && puppet env refresh or whatever)...

if someone is just starting, just have a way to disable caching altogether, so you can quickly iterate on your 3 test machines + puppetmaster, and those of us who have large scale puppet masters can use that mechanism when we deploy new modules/config for puppet.

I'd like to have both clear specific env cache, as well as "clear everything" (although, i would think a "clear all caches" would be sufficient at first,...

my $0.02
Jok

Andy Parker

unread,
May 1, 2014, 3:34:55 PM5/1/14
to puppe...@googlegroups.com
On Thu, May 1, 2014 at 11:24 AM, Joachim Thuau <goo...@korigan.net> wrote:


On Tuesday, April 22, 2014 6:41:18 AM UTC-7, Trevor Vaughan wrote:
The more I read all of this, the more I'm a fan of a MANUAL mode.


Cool. I'm coming around to the viewpoint that we are going to have to support this, but it isn't going to be part of the 3.6.0 release. Unfortunately because of other commitments we have for the 3.7 and 4.0 releases I'm not sure we'll get to it then either. However adding it in shouldn't be too difficult, so if anyone wants to take a stab at it, please do.
 
A file per environment $ENV_HEAD/clear_cache or something that, when touched, will cause the cache to clear for that environment.


Since every environment has a directory now, just touching the environment's directory would probably work just as well.
 
I also like the NONE option since, as was pointed out, dev/test environments shouldn't be caching.

The others don't interest me all that much in general.

If the per environment option is used, I'd like to see an associated puppet option to go with it.


All of the caching control is now on a per-environment basis. The new environment.conf file can specify that particular environment's environment_timeout value. So in order to get this kind of behavior would either add a new setting for the environment.conf or add a new valid value for environment_timeout. I'm thinking that adding a "manual" value to environment.conf would be doable.
 
# Clear the 'production' environment cache
puppet cache clear

# Clear the 'test' environment cache
puppet cache clear --env='test'


I think those commands would be easily done. You just need to list the environments, get the configuration for each one and touch the environment's directory. Right now the don't make the environment's directory available on that object in the code, AFAIK, but exposing that should be trivial.
 


something like that could also be used with the source control system when something gets deployed to refreshed the environment once things are "updated" (svn up && puppet env refresh or whatever)...


Yes. The way of doing this without a manual refresh is to send a graceful restart to the master's container.
 
if someone is just starting, just have a way to disable caching altogether, so you can quickly iterate on your 3 test machines + puppetmaster, and those of us who have large scale puppet masters can use that mechanism when we deploy new modules/config for puppet.

I'd like to have both clear specific env cache, as well as "clear everything" (although, i would think a "clear all caches" would be sufficient at first,...

my $0.02
Jok

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Henrik Lindberg

unread,
May 1, 2014, 5:26:42 PM5/1/14
to puppe...@googlegroups.com
On 2014-01-05 21:34, Andy Parker wrote:
> On Thu, May 1, 2014 at 11:24 AM, Joachim Thuau <goo...@korigan.net
> <mailto:goo...@korigan.net>> wrote:

>
> All of the caching control is now on a per-environment basis. The new
> environment.conf file can specify that particular environment's
> environment_timeout value. So in order to get this kind of behavior
> would either add a new setting for the environment.conf or add a new
> valid value for environment_timeout. I'm thinking that adding a "manual"
> value to environment.conf would be doable.
>
> # Clear the 'production' environment cache
> puppet cache clear
>
> # Clear the 'test' environment cache
> puppet cache clear --env='test'
>
>
> I think those commands would be easily done. You just need to list the
> environments, get the configuration for each one and touch the
> environment's directory. Right now the don't make the environment's
> directory available on that object in the code, AFAIK, but exposing that
> should be trivial.
>

I can imagine that MANUAL control is always there irrespective of the
timeout. This way you can have a long timeout set (say an hour), but you
can also manually touch the directory. This way no additional settings
are required. I don't think there is any harm in always allowing a
manual refresh. However, if changes are made as a checkout/pull of a
complete environment, it may not behave right depending on the order the
files are updated (which depends on the SCM system in use), so I am not
yet convinced that we can simply use the environment's directory itself
as the file to watch.

It was complications like these that made us not implement this in
a way that at first appears to be both simple and obvious (i.e. watch
the environment's directory).

A safe implementation would be to have a second hierarchy of files that
are used solely for cache control. If we have a single control that just
flushes all caches the design is somewhat less ugly...

I so want to see a file called 'tickle_me_elmo" :-)

- henrik


Stig Sandbeck Mathisen

unread,
May 1, 2014, 4:56:02 PM5/1/14
to puppe...@googlegroups.com
Trevor Vaughan <tvau...@onyxpoint.com> writes:

> puppet env refresh 'foobar' or the like

That would make it easy to invalidate an environment cache from
deployment tools like "r10k", or from external data sources used by
functions or hiera with a remote backend.

--
Stig Sandbeck Mathisen

Daniele Sluijters

unread,
May 8, 2014, 2:23:22 PM5/8/14
to puppe...@googlegroups.com
So 3.6.0-rc1 is out but the new caching mechanism seems pretty much useless to me without a face or method to invalidate that cache (restarting the Puppetmaster is not a good method unless there's some one to do it with a SIGHUP or USR1/2).

The only way I can see this working for most people is having a cache of unlimited and invalidating that cache at deploy time. I'm not very susceptible that 'long lived low traffic environments like production' can benefit from large cache times because when you roll out a change you want it now, not now + timeout.

Andy Parker

unread,
May 8, 2014, 4:48:48 PM5/8/14
to puppe...@googlegroups.com
On Thu, May 8, 2014 at 11:23 AM, Daniele Sluijters <daniele....@gmail.com> wrote:
So 3.6.0-rc1 is out but the new caching mechanism seems pretty much useless to me without a face or method to invalidate that cache (restarting the Puppetmaster is not a good method unless there's some one to do it with a SIGHUP or USR1/2).


Thanks for the feedback. There are a couple things that I'm curious about. Are you not using a server that supports a graceful restart (apache + passenger, nginx + unicorn and various combinations appear to)? 
 
The only way I can see this working for most people is having a cache of unlimited and invalidating that cache at deploy time. I'm not very susceptible that 'long lived low traffic environments like production' can benefit from large cache times because when you roll out a change you want it now, not now + timeout.

You've always had now + timeout. What happened previously was that puppet used the "filetimeout" setting to throttle how often it would stat all of the manifest files that it had loaded. If any of those files had changed, then it would trigger a reload, but not before the "filetimeout" expired. This had the side effect that if a new file appeared, it wasn't guaranteed that puppet would pick it up unless another file that it was already watching changed as well. In effect the new system trades off reparsing the manifests for reducing stat calls. Sometimes that will end up being faster, sometimes that will end up being slower.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Andy Parker

unread,
May 8, 2014, 5:17:01 PM5/8/14
to puppe...@googlegroups.com
On Wed, Apr 23, 2014 at 12:31 PM, John Bollinger <john.bo...@stjude.org> wrote:


On Tuesday, April 22, 2014 8:19:57 PM UTC-5, henrik lindberg wrote:
The problem with doing the manual cache invalidation is knowing which
running instance of the master to talk to


Why would you want to talk only to one?  I can't think of a single reason.  If you want to force a cache flush then want to do it for all instances of the master.

 
, and it would either need an
IPC mechanism, or that all instances watch the same file - and then we
are back at the complex behavior we want to avoid...



Well that's one of the advantages of ENVDIRCHANGE.  All the instances watch the environment path directories for changes -- done.  That's the directories themselves, not necessarily their contents.  The whole cache goes stale, for each master instance, if the mtime of one of the environmentpath directories changes.  I don't see how that yields anything nearly as complicated or quirky as the cache management approach available now.

If you wanted to provide a bit richer cache management feature set then the master could also watch the individual environment directories (again, the directories themselves) within the environment base directory.  That could allow each environment's cache to be flushed independently.


I've opened PUP-2520 to track this request. If anyone wants to take a stab at this, please feel free. In the description of the issue, I provided one way (the one described here) to do this, but if anyone who implements it has other ideas, please explain it and submit that as a patch instead.
 
A restart of the rack server takes time, during which Puppet would be unavailable.  On a site afflicted with long catalog compilation times, or one where that master serves up large files, the restart could consume enough time to be a problem.  Also, on a server that enforces fine-grained mandatory access controls it could be a much bigger deal to restart Puppet than just to touch a particular file.


John



John

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Brice Figureau

unread,
May 11, 2014, 8:03:06 AM5/11/14
to puppe...@googlegroups.com
On 08/05/2014 23:17, Andy Parker wrote:
> On Wed, Apr 23, 2014 at 12:31 PM, John Bollinger
> <john.bo...@stjude.org <mailto:john.bo...@stjude.org>> wrote:
>
>
>
> On Tuesday, April 22, 2014 8:19:57 PM UTC-5, henrik lindberg wrote:
>
> The problem with doing the manual cache invalidation is knowing
> which
> running instance of the master to talk to
>
>
>
> Why would you want to talk only to one? I can't think of a single
> reason. If you want to force a cache flush then want to do it for
> /all/ instances of the master.
>
>
>
> , and it would either need an
> IPC mechanism, or that all instances watch the same file - and
> then we
> are back at the complex behavior we want to avoid...
>
>
>
> Well that's one of the advantages of ENVDIRCHANGE. All the
> instances watch the environment path directories for changes --
> done. That's the directories themselves, not necessarily their
> contents. The whole cache goes stale, for each master instance, if
> the mtime of one of the environmentpath directories changes. I
> don't see how that yields anything nearly as complicated or quirky
> as the cache management approach available now.
>
> If you wanted to provide a bit richer cache management feature set
> then the master could also watch the individual environment
> directories (again, the directories themselves) within the
> environment base directory. That could allow each environment's
> cache to be flushed independently.
>
>
> I've opened PUP-2520 to track this request. If anyone wants to take a
> stab at this, please feel free. In the description of the issue, I
> provided one way (the one described here) to do this, but if anyone who
> implements it has other ideas, please explain it and submit that as a
> patch instead.

I just sent a github PR to implement the 'manual' setting along with a
face to invalidate a given environment cache:

https://github.com/puppetlabs/puppet/pull/2638

> A restart of the rack server takes time, during which Puppet would
> be unavailable. On a site afflicted with long catalog compilation
> times, or one where that master serves up large files, the restart
> could consume enough time to be a problem. Also, on a server that
> enforces fine-grained mandatory access controls it could be a much
> bigger deal to restart Puppet than just to touch a particular file.

All people watching this thread, please try the patch and let me know if
that works or not for you.
Reply all
Reply to author
Forward
0 new messages