Should we attempt to avoid recompiling the catalog?

0 views
Skip to first unread message

Luke Kanies

unread,
Apr 9, 2008, 12:57:51 PM4/9/08
to puppe...@googlegroups.com
Hi all,

As promised, here's my follow-up to the email on caching.

Almost since its creation, Puppet has attempted to avoid recompiling
its catalog when possible. I think I originally did this as an
optimization, and I'm beginning to think I've been holding onto this
feature for no good reason.

Turns out that it kinda broke sometime in 0.24. The catalog gets
recompiled every time anyway. What I'm trying to do right now is
figure out whether it's worth trying to fix that, or if I should just
carve out the code that does this.

As an aside, I think the bit that's broken is that the client is not
correctly caching its facts, so it's always comparing facts against an
old version, so they've always changed, which always forces a
recompile. That bit should probably be fixed.

Upside of carving it out:

No more worries about why the heck the catalog isn't being
recompiled. I've had lots of bugs filed against this -- it's not
picking up new files, it's not noticing changes in my external node
store, it's not noticing changes in facts, etc. These bugs magically
go away (well, they're already gone, but no worries about more
appearing).

Less code is always good when reasonable. In this case, there's a
good bit of infrastructure around knowing if we should recompile (see
below).

No premature optimization -- if we don't need to avoid recompiling, we
shouldn't.

Downside:

Compiling is relatively expensive, especially when using stored
configs. In most cases, the configuration hasn't changed, so it ends
up being a lot of wasted effort. It doesn't really feel like a
premature optimization, considering the potential savings.

So, question 1: Do we just get rid of selective recompiling? If so,
then you can stop reading. :)

Here's how the selective recompiling works right now:

The client collects its facts and compares them to the facts it cached
last time; if they've changed, it recompiles.

The client asks the server for the "version" of the configuration it's
currently running. The client compares this timestamp to the
timestamp of its configuration. If the server's timestamp is later,
then the client recompiles.

The server calculates the configuration version as the later of the
last parse time, the timestamp of the client's facts, and the
timestamp of the node configuration. (Well, this is actually a lie --
the current code ignores the node and fact info; this is the code that
is in place but not yet used.) This would work if we were sticking to
versions for things like nodes and facts, but (as my last email
explained) we're using TTLs for them instead.

How selective recompiling *could* work:

In the switch to TTLs, the server could rely on the expiration date of
the node and facts -- if either is expired, then a recompile is
forced. This doesn't really make sense, though, because you'd
normally just set the ttl to the runinterval, which means they'd
automatically expire after 30 mins, forcing a recompile every time.
So, this would only work if you tuned the ttl higher.

You could just ignore the server's node and fact information,
continuing to rely on the time of the last parse: If the client's
catalog is older than the last parse, recompile. I can basically
guarantee this will cause all kinds of problems, though, because of
external node sources.

I'm leaning toward just getting rid of this feature entirely.

Any dissenters? Is there anything I'm missing?

--
Basic research is what I am doing when I don't know what I am doing.
--Wernher von Braun
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com

Adam Jacob

unread,
Apr 9, 2008, 4:28:51 PM4/9/08
to puppe...@googlegroups.com
On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com> wrote:
> So, question 1: Do we just get rid of selective recompiling? If so,
> then you can stop reading. :)

I vote for getting rid of it.

I already run with --ignore-cache, since we're using an external node
tool that might cause changes that puppet doesn't pick up in the
normal course of things.

As for the slowdown for people using stored configs, the right long
term answer here might be to write those storage requests out to a
queue instead of directly to the database anyway. That un-bundles the
act of feeding a client the manifest list from recording what has been
done for posterity. (You could even look at using my early
implementation of runnels to make this work, although it might be
overkill.)

Adam

--
HJK Solutions - We Launch Startups - http://www.hjksolutions.com
Adam Jacob, Senior Partner
T: (206) 508-4759 E: ad...@hjksolutions.com

Luke Kanies

unread,
Apr 9, 2008, 4:42:40 PM4/9/08
to puppe...@googlegroups.com
On Apr 9, 2008, at 3:28 PM, Adam Jacob wrote:

>
> On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com> wrote:
>> So, question 1: Do we just get rid of selective recompiling? If so,
>> then you can stop reading. :)
>
> I vote for getting rid of it.
>
> I already run with --ignore-cache, since we're using an external node
> tool that might cause changes that puppet doesn't pick up in the
> normal course of things.

Ok.

>
> As for the slowdown for people using stored configs, the right long
> term answer here might be to write those storage requests out to a
> queue instead of directly to the database anyway. That un-bundles the
> act of feeding a client the manifest list from recording what has been
> done for posterity. (You could even look at using my early
> implementation of runnels to make this work, although it might be
> overkill.)

That's been my long term plan all along, and will be much easier once
the current server-side code is replaced with REST, which I've been
talking about so long now it's starting to feel like a myth (even
though we're actually making great progress).

Basically, you just configure the server to cache the client catalogs
however you want, and you treat that cache as your queue.

--
He played the king as if afraid someone else would play the ace.
--John Mason Brown

Blake Barnett

unread,
Apr 9, 2008, 4:43:08 PM4/9/08
to puppe...@googlegroups.com

On Apr 9, 2008, at 1:28 PM, Adam Jacob wrote:

>
> On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com> wrote:
>> So, question 1: Do we just get rid of selective recompiling? If so,
>> then you can stop reading. :)
>
> I vote for getting rid of it.
>
> I already run with --ignore-cache, since we're using an external node
> tool that might cause changes that puppet doesn't pick up in the
> normal course of things.
>
> As for the slowdown for people using stored configs, the right long
> term answer here might be to write those storage requests out to a
> queue instead of directly to the database anyway. That un-bundles the
> act of feeding a client the manifest list from recording what has been
> done for posterity. (You could even look at using my early
> implementation of runnels to make this work, although it might be
> overkill.)

I think the long-term solution is to decouple storeconfigs from
puppetmasterd (which Luke and I have talked a lot about), and have an
external tool (puppetshow, iclassify, etc.) that can query
puppetmasterd for node information via REST. In this scenario it
would be great if there were a cache so that the external tools can
query for only what's changed since <timestamp>. I think it would be
A Bad Thing if a tool like this were to query for all nodes and
puppetmasterd would have to compile everything every time.

-Blake

Blake Barnett

unread,
Apr 9, 2008, 4:44:09 PM4/9/08
to puppe...@googlegroups.com

Yeah. Essentially what I was saying in my last email.

-Blake

Arjuna Christensen

unread,
Apr 9, 2008, 6:02:37 PM4/9/08
to puppe...@googlegroups.com
I'm inclined to agree that Caching is extraneous for 'perfect' behaviour. I always run *every* node run, even --noop node runs with --test (and/or its longer equivalents.)

Arjuna Christensen | Systems Engineer 
Maximum Internet Ltd
DDI: + 64 9 913 9683 | Ph: +64 9 915 1825 | Fax:: +64 9 300 7227
arjuna.ch...@maxnet.co.nz| www.maxnet.co.nz

David Schmitt

unread,
Apr 10, 2008, 3:24:40 AM4/10/08
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 09 April 2008, Adam Jacob wrote:
> On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com> wrote:
> > So, question 1: Do we just get rid of selective recompiling? If so,
> > then you can stop reading. :)
>
> I vote for getting rid of it.

+1

> I already run with --ignore-cache, since we're using an external node
> tool that might cause changes that puppet doesn't pick up in the
> normal course of things.
>
> As for the slowdown for people using stored configs, the right long
> term answer here might be

.. what Luke wrote in the other mail:
> So, if performance is the goal, really, the time is better spent
> optimizing existing code, rather than trying to avoid doing the work.


Thanks, DavidS

- --
The primary freedom of open source is not the freedom from cost, but the free-
dom to shape software to do what you want. This freedom is /never/ exercised
without cost, but is available /at all/ only by accepting the very different
costs associated with open source, costs not in money, but in time and effort.
- -- http://www.schierer.org/~luke/log/20070710-1129/on-forks-and-forking
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH/cDQ/Pp1N6Uzh0URAm/lAKCCV6o5L1MxFqYI/YKlw3NDHHpJTgCgoif0
z3R5yP/on5yeW0mlD3xqE7M=
=uGWn
-----END PGP SIGNATURE-----

James Turnbull

unread,
Apr 10, 2008, 8:49:57 AM4/10/08
to puppe...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Schmitt wrote:
> On Wednesday 09 April 2008, Adam Jacob wrote:
>> On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com> wrote:
>>> So, question 1: Do we just get rid of selective recompiling? If so,
>>> then you can stop reading. :)
>> I vote for getting rid of it.
>
> +1
>

+1

Regards

James


- --
James Turnbull (ja...@lovedthanlost.net)
- --
Author of:
- - Pulling Strings with Puppet
(http://www.amazon.com/gp/product/1590599780/)
- - Pro Nagios 2.0
(http://www.amazon.com/gp/product/1590596099/)
- - Hardening Linux
(http://www.amazon.com/gp/product/1590594444/)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH/gz19hTGvAxC30ARAlG4AJ4u5MxpD0bEveIWQE3v66HrfzPvHwCg1V9n
ZX3+LaS6Q8HD94qjw8t1z0E=
=5sr7
-----END PGP SIGNATURE-----

Luke Kanies

unread,
Apr 10, 2008, 11:15:37 AM4/10/08
to puppe...@googlegroups.com
On Apr 10, 2008, at 7:49 AM, James Turnbull wrote:

> David Schmitt wrote:
>> On Wednesday 09 April 2008, Adam Jacob wrote:
>>> On Wed, Apr 9, 2008 at 9:57 AM, Luke Kanies <lu...@madstop.com>
>>> wrote:
>>>> So, question 1: Do we just get rid of selective recompiling? If
>>>> so,
>>>> then you can stop reading. :)
>>> I vote for getting rid of it.
>>
>> +1
>>
>
> +1


The consensus is clear, then; this "optimization" goes away.

Thanks for the feedback.

--
It's very hard to predict things . . . Especially the future.
-- Prof. Charles Kelemen, Swarthmore CS Dept.

Reply all
Reply to author
Forward
0 new messages