Hi,
For development questions, feel free to post in puppet-dev :)
You're not the first irritated by those md5 computations taking time.
That's something I'd like to really optimize since a loooong time.
That's simple quite difficult.
On 22/10/12 21:09, Bostjan Skufca wrote:
> Hi there,
>
> I'm running into slow catalog runs because of many files that are
> managed. I was thinking about some optimizations of this functionality.
>
> 1: On puppetmaster:
> For files with "source => 'puppet:///modules...' puppetmaster should
> already calculate md5 and send it with the catalog.
That's what the static compiler does, if I'm not mistaken. The static
compiler is part of puppet since 2.7.
> 2: On managed node:
> As md5s for files are already there once catalog is received, there is
> no need for x https calls (x is the number of files managed with
> source=> parameter)
>
> 3. Puppetmaster md5 cache
> This would of course put some strain on puppetmaster, which would then
> benefit from some sort of file md5 cache:
> - when md5 is calculated, put in into cache, key is filename. Also add
> file mtime and time of cache insert.
> - on each catalog request, for each file in the catalog check if mtime
> has changed, and if so, recalculate md5 hash, else just retrieve md5
> hash from cache
> - some sort of stale cache entries removal, based on cache insert time,
> maybe at the end of each puppet catalog compilation, maybe controlled
> with probability 1:100 or something
Actually checking the mtime/size prior to do any md5 computations could
be a big win.
But that's not all, in fact there are 3 md5 computations per files
taking place during a puppet run:
* one by the master when computing file metadata
* one by the agent on the existing file (this helps to know if the file
changed)
* and finally one after writing the change to the files to make sure we
wrote it correctly.
A potential solution would be to implement a different checksum type
(maybe less powerful than a md5, but faster).
> Do you have any comments about these optimizations? They will be greatly
> appreciated... really :)
Well, I believe we're (at least myself) very aware of those issues. The
fact that it never got fixed (except by the static compiler) is that
it's a complex stuff. Last time I tried to fiddle with the checksumming,
I never quite got anywhere :)
As I said in the preamble, feel free to chime in puppet-dev to talk
about this, and check the various redmine tickets regarding those issues.