Best way for sync folder with many files and subfolders via Puppet that changed rarely

1,010 views
Skip to first unread message

Alexey Korepov

unread,
Nov 2, 2014, 4:39:23 AM11/2/14
to puppet...@googlegroups.com
I need to sync one folder on server with many clients via Puppet, but this folder contains many small files and subfolders. And I afraid that adding this folder to Puppet via standard way will slow-down sync process on each client.

As I understand, Puppet on every sync action recheck each file (rebuild md5 sum) on server and client for finding changes.

On client computers this folder will be readonly, so we don't need to recheck md5 sums on every sync process. On server files will be changed very rarely too.

Can you recommend the best way for sync this folder via Puppet with minimally slow-down sync process, traffic and resources usage?

Tim Dunphy

unread,
Nov 2, 2014, 8:36:19 PM11/2/14
to puppet...@googlegroups.com
Hello,

 I've been able to use the recurse function of the File type successfully:

  file { "/opt/solr":
       source => "puppet:///modules/solr/solr-files",
       owner => "tomcat",
       group => "tomcat",
       recurse => true
     }

The recurse option will sync an entire folder of files. 

Tim

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/02ffca20-e9b0-46f5-bb7f-a57ca5e4535a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

David Danzilio

unread,
Nov 3, 2014, 7:43:07 AM11/3/14
to puppet...@googlegroups.com
Any reason you can't lay these files down with a package? I definitely recommend staying away from recursive file resources.

jcbollinger

unread,
Nov 3, 2014, 9:53:09 AM11/3/14
to puppet...@googlegroups.com


On Sunday, November 2, 2014 3:39:23 AM UTC-6, Alexey Korepov wrote:
I need to sync one folder on server with many clients via Puppet, but this folder contains many small files and subfolders. And I afraid that adding this folder to Puppet via standard way will slow-down sync process on each client.



If "via the standard way" you mean using File resources then yes, adding management of the folder in question will slow down runs.

 
As I understand, Puppet on every sync action recheck each file (rebuild md5 sum) on server and client for finding changes.

On client computers this folder will be readonly, so we don't need to recheck md5 sums on every sync process. On server files will be changed very rarely too.



If the files were genuinely read-only then you would have a different problem: Puppet would not be able to update them when that's needed.  If Puppet (running as root) can update them, on the other hand, then it is not safe to assume that they will remain unchanged between Puppet runs.

 
Can you recommend the best way for sync this folder via Puppet with minimally slow-down sync process, traffic and resources usage?



There is an inherent trade-off between the agent's runtime and its reliability in determining which managed files are out of sync.  You can choose a comfortable point on that spectrum via the File resource's 'checksum' parameter.  Some of the options are considerably cheaper to compute than an md5 sum, but correspondingly more susceptible to inaccurate results (both false positives and false negatives).

On the other hand, if the overall file set changes only rarely, then you could consider building a package out of them (RPM, DEB, etc., as appropriate for your nodes), dropping it in a local repository, and managing the collection via a Package resource.  This is especially suitable if, as seems the case, you are confident that the files will not be locally changed between Puppet runs.  Of course, the tradeoff there is the need to build and post a new version of package whenever files need to be changed.


John

Thomas Bendler

unread,
Nov 3, 2014, 10:35:48 AM11/3/14
to puppet-users
2014-11-03 15:53 GMT+01:00 jcbollinger <John.Bo...@stjude.org>:
[
​...]​

As I understand, Puppet on every sync action recheck each file (rebuild md5 sum) on server and client for finding changes.

On client computers this folder will be readonly, so we don't need to recheck md5 sums on every sync process. On server files will be changed very rarely too.
If the files were genuinely read-only then you would have a different problem: Puppet would not be able to update them when that's needed.  If Puppet (running as root) can update them, on the other hand, then it is not safe to assume that they will remain unchanged between Puppet runs.
[
​...]

​The easiest way will be an exec statement running an rsync process for the synchronization. Maybe based on a flag if things changed on server side like an empty file in the root directory every time a file changed:

if $updateStart {

  file { '/srv/update':

    ensure  => directory,

    recurse => true,

    purge   => true,

    force   => true,

    mode    => 0644,

    owner   => "root",

    group   => "root",

    source  => 'puppet:///module/update';

  }

}


exec { 'RsyncLocalFiles':

  command => "rsync -az us...@server1.example.com:/srv/files /srv/files && rm /srv/update/start",

  onlyif  => "/usr/bin/test -e /srv/update/start",

}


So can use Puppet to deploy the update file via an parameter if an update is needed and the sync remove the flag if finished. This is just quick'n'dirty, maybe there is a more elegant way.

Regards Thomas​

Jeffrey Miller

unread,
Nov 3, 2014, 4:32:15 PM11/3/14
to puppet...@googlegroups.com

Jeff-l...@uiowa.edu

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages