modules, violating puppet principles?

31 views
Skip to first unread message

Philip Brown

unread,
May 11, 2012, 1:00:36 PM5/11/12
to puppe...@googlegroups.com
I've just started experimenting with using some modules on our puppet installation, and observing behaviour.
From these observations, it suddenly struck me, that the current module implementation, violates what has previously been described as good puppet design.

It has previously been said, by multiple people in multiple places, variants of,
"'Dont use puppet to distribute a lot of files; it's inefficient! use rsync, or (that other file transport thingie)"
or, "Use packages!"

Oddly, the new module architecture, and plugins, in general, seem to violate both principles.

In my testing with puppet version 2.7.9, I dropped in the files for stdlib module, into the module dir. Ran the client side. It synced up. okay, great.
Then I created a new random .rb file under stdlib/lib/puppet.

It got synced on the next run.

"Hmm.. maybe it just tests for dates on directories?" I thought to myself. "new file = new sync?"

So I tested this by updating just the file.
It got resynced.

In just stdlib alone, there are 63 files. That's more than halfway to 100. A not insignificant amount of files.
And this syncing is done via full md5 checksum? That is **less** efficient than normal rsync, which normally checks just timestamp/size!

Does this module/plugin design not violate long-standing puppet "best practices" ?

Along with my critique, I will also offer some suggested fixes.

A) Make plugin syncing, be done rsync style; only resync if timestamp changed
B) Make plugin syncing, be more "package"-like. Check for version in Modulefile. Compare version for both client and server. Update only if version mismatch.



Trevor Vaughan

unread,
May 11, 2012, 1:13:05 PM5/11/12
to puppe...@googlegroups.com
This is a good idea but not everyone has a Modulefile in their modules.

It's not a bad idea to start requiring one though for just this purpose.

I would suggest checking the file size before checking anything else.
If they differ, resync.

Trevor
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-dev/-/bNfqZY_aJ6YJ.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-dev+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-dev?hl=en.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

Philip Brown

unread,
May 11, 2012, 1:18:04 PM5/11/12
to puppe...@googlegroups.com


On Friday, May 11, 2012 10:13:05 AM UTC-7, Trevor Vaughan wrote:
This is a good idea but not everyone has a Modulefile in their modules.

It's not a bad idea to start requiring one though for just this purpose.

I would suggest checking the file size before checking anything else.
If they differ, resync.


timestamp is more important than filesize. There could easily be a one-char syntax fix. Or a transposition fix.
But timestamp doesnt go backwards unless someone deliberately sets it.
 

Trevor Vaughan

unread,
May 11, 2012, 1:25:41 PM5/11/12
to puppe...@googlegroups.com
Oh, what I meant was to check the file size first, not to ignore the timestamp.

Also, you do have to watch timestamps on a highly loaded virtualized
environment. Clocks can skew quite a bit.

I also didn't mean tha Modulefile's should be required, but that they
should be supported if present.

I've got to slow down my typing....

Trevor
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-dev/-/bJcZhSqWG9cJ.

Philip Brown

unread,
May 11, 2012, 1:32:11 PM5/11/12
to puppe...@googlegroups.com


On Friday, May 11, 2012 10:25:41 AM UTC-7, Trevor Vaughan wrote:
Oh, what I meant was to check the file size first, not to ignore the timestamp.

Also, you do have to watch timestamps on a highly loaded virtualized
environment. Clocks can skew quite a bit.



Even in that sort of situation, if puppet did it in proper rsync style, it wouldnt matter. If the client sets the timestamp on its file to be the same as the master, then you no longer have a "greater than" comparison, but an "is equal to" comparison. Skew all you want, but with milisecond resolution(or sometimes greater!), the chances of accidentally having non-matching files overlooked, is almost nil.

 .. unfortunately, puppet does not do this currently, either. It does not set client file timestamp time to match master, at present.

Ryan Coleman

unread,
May 11, 2012, 4:55:15 PM5/11/12
to puppe...@googlegroups.com

On Friday, May 11, 2012 10:00:36 AM UTC-7, Philip Brown wrote:
I've just started experimenting with using some modules on our puppet installation, and observing behaviour.
From these observations, it suddenly struck me, that the current module implementation, violates what has previously been described as good puppet design.

It has previously been said, by multiple people in multiple places, variants of,
"'Dont use puppet to distribute a lot of files; it's inefficient! use rsync, or (that other file transport thingie)"
or, "Use packages!"

Oddly, the new module architecture, and plugins, in general, seem to violate both principles.


These statements were generally made when people were trying to deploy web applications with hundreds of files quite often as part of their release process. For those purposes, using native packaging is a viable alternative. Puppet has also received significant performance improvements with recursive file serving that makes the process more tolerable now. 

Plugins are a different animal. They are distributed to an agent and don't change again unless the custom fact, function or whatever is removed or modified on the Puppet master. Personally, I have different performance concerns for that type of process than when I want to deploy my web application to hundreds of clients several times a week/day. 

Jeff McCune

unread,
May 11, 2012, 9:40:19 PM5/11/12
to puppe...@googlegroups.com
And to jump on this...

We absolutely have to make sure synchronizing plugins is fast and efficient. If something like stdlib is a performance issue by its nature of containing lots of additional functionality then we consider that a bug and we'll fix it.

For me, hundreds of file resources aren't really a concern in puppet today. even thousands should be fine.

It's large files that are a concern.  if you have a hundred files of about a meg each then that's where we have concerns.

--
Jeff McCune

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-dev/-/lkDsd7SMwT8J.

Philip Brown

unread,
May 11, 2012, 11:06:08 PM5/11/12
to puppe...@googlegroups.com


On Friday, May 11, 2012 6:40:19 PM UTC-7, Jeff McCune wrote:
And to jump on this...

We absolutely have to make sure synchronizing plugins is fast and efficient. If something like stdlib is a performance issue by its nature of containing lots of additional functionality then we consider that a bug and we'll fix it.

For me, hundreds of file resources aren't really a concern in puppet today. even thousands should be fine.

It's large files that are a concern.  if you have a hundred files of about a meg each then that's where we have concerns.



It's interesting... that you are concerned about one aspect, but not at all about another.
 
To give an extreme example of why I care: We used to have a large rsync job, to transfer files between one host and another that ran every night.It was on a relatively large filesystem.  A full resync, where the other side was wiped, took something like 1 hour. 

However, a run where everything was already in sync... took *half an hour*. Half the time of the sync, was just checking the file dates and sizes.

file comparisons are a small resource cost, but they are non-zero; and that's when you're only doing stat(). Actually reading the things and chksumming them, is signifcantly worse.
Have a lot of them, and they add up.
If modules get more popular, then you will potentially find it commonplace to have hundreds of files that need syncing. *per client*.

puppet already has a reputation of having difficulty scaling with a single master server. It would be a shame to have deliberate design choices make that worse.
1,000 system farms are becoming commonplace.   For some admins, if a product cannot reliably scale to handle that number of nodes from a single master, then they view the product as not designed for their standards of scaling, and they seek elsewhere.

Are you giving up that area as a design target?



Nick Lewis

unread,
May 12, 2012, 12:07:16 AM5/12/12
to puppe...@googlegroups.com
This would certainly be wonderful to tackle, but it's not especially
critical, imo. Puppet isn't primarily a tool for syncing files; there
are plenty tools which *are* designed to do that well, and they'll
work just fine used in conjunction with Puppet. You can absolutely use
an rsync exec resource (as many users do) if you have a number of
files that Puppet can't handle. But, as Jeff said, for only hundreds
or thousands of files, Puppet should be fine. If *that* isn't the
case, it's definitely something we should address. Tens or hundreds of
thousands, on the other hand, I would advise using a more specialized
tool.

And from a more technical point of view, bulk operations simply aren't
something Puppet is really capable of handling today. With few
exceptions, Puppet manages resources only on an individual level. We
want to do bulk operations, but it entails significant engineering
effort.

I do think the ability for Puppet itself to use a tool like rsync
would be cool. Partly because then you don't have to, but also because
we could still report on what changed, which is what's somewhat lost
by an exec resource. This could either be as the implementation of
bulk file sourcing or, as I would prefer, an "rsync" (or similar)
resource type.

As for the concerns specifically about plugin syncing, I agree that we
probably ought to be properly using timestamps there. I disagree,
however, that timestamps would be the correct implementation for
general files. Anyway, currently pluginsync is just using file
resources with source set, which use file content rather than
timestamps by default. It should be fairly simple to use timestamps
instead when syncing plugins. Again, though, it's also probably not
realistically going to cause much of a performance issue as it is.

>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-dev/-/GThkaIhY6RYJ.
Reply all
Reply to author
Forward
0 new messages