Speed issue when copying 'n' files

Sergiu Cornea

unread,

Aug 17, 2015, 10:23:30 AM8/17/15

to Puppet Users

Hello guys,

So the scenario is as follows:

On a virtual host I've got 10 linux containers and each container has a website associated with. Therefore, for that website to work I have to copy around 100 + files using Puppet.

At the moment I am using rsync to do so as I have tried using Puppet to copy the files over but the Puppet agent run speed was considerably slow, and that's just for 10 websites, which is bound to grow.

However, using rsync saves me some time, however, the run it is still quiet slow.

What are you guys suggesting that I should do? or how are you guys going about copying a large number of files using Puppet?

Thanks,

Regards,

Sergiu

jcbollinger

unread,

Aug 18, 2015, 9:34:37 AM8/18/15

to Puppet Users

On Monday, August 17, 2015 at 9:23:30 AM UTC-5, Sergiu Cornea wrote:

Hello guys,

So the scenario is as follows:

On a virtual host I've got 10 linux containers and each container has a website associated with. Therefore, for that website to work I have to copy around 100 + files using Puppet.

At the moment I am using rsync to do so as I have tried using Puppet to copy the files over but the Puppet agent run speed was considerably slow, and that's just for 10 websites, which is bound to grow.

To be precise, you are using Puppet to manage ~100 files per node. Each one should be copied only in the event that its contents on the target node differ from its contents on the master (including if the file is missing altogether). Puppet will not copy the files if the contents are unchanged, which it checks via a configurable comparison function.

However, using rsync saves me some time, however, the run it is still quiet slow.

If rsync is slow for this task then it must be that the aggregate size of your ~100 files is enormous, that your available processing power per container is meager, or both.

What are you guys suggesting that I should do? or how are you guys going about copying a large number of files using Puppet?

You're not really into the realm that I'd characterize as "a large number" of files yet, but yes, Puppet is not well suited to be used as a content-management system. You have many options both inside Puppet and out, however, to improve the observed performance.

One alternative is to use a less expensive mechanism for checking file contents by setting a different 'checksum' parameter on the File resources. This is a tradeoff between speed and reliability, with the default, 'md5', providing maximum reliability. You could get a moderate performance improvement without giving up too much reliability by changing to 'md5lite'. You could get a great performance improvement at the cost of a significant reduction in reliability by choosing the 'mtime' option (thereby relying on file modification timestamps).

If the files in question rarely change, then another alternative would be to make a package out of them for your target machines' native packaging system (RPM, Apt, ...), and manage the package instead of individual files. This is pretty effective for installation, but not necessarily so good for catching and reverting changes to the installed files.

Another alternative would be to enroll the files in a version-control system such as Git, and have Puppet use that to sync files with their master copies instead of managing the files as resources. Honestly, I think this is a pretty good fit to the usage you describe.

John

Sergiu Cornea

unread,

Aug 28, 2015, 8:51:42 AM8/28/15

to Puppet Users

Hi John,

Thank you for your reply. So I have decided to go for RPM, however, I am wondering now as I have finished my first package if I need to create an RPM for each website (as the location is different such as: myexample.com/ and myexample1.com/ or I could do this by passing some arguments?

Thank you,

Regards,

Sergiu

jcbollinger

unread,

Aug 31, 2015, 9:38:01 AM8/31/15

to Puppet Users

On Friday, August 28, 2015 at 7:51:42 AM UTC-5, Sergiu Cornea wrote:

Hi John,

Thank you for your reply. So I have decided to go for RPM, however, I am wondering now as I have finished my first package if I need to create an RPM for each website (as the location is different such as: myexample.com/ and myexample1.com/ or I could do this by passing some arguments?

If the web sites differ only in server name, then I would advise you to (re-)write your HTML so that it uses only relative links to content within the site. Alternatively, use some form of dynamic pages. Either way, the content you distribute can then be identical on all the sites.

If you're talking about a small number of configuration files, on the other hand, then those few files are exactly the kind of thing that you should use Puppet to manage directly. A common pattern is for the package to include default configuration files along with all the content, and for you to use Puppet File resources to manage just those files for each node after the package is installed (by Puppet). Since in this case you are in control of the RPM, which is not always the case, you have the alternative of omitting from the RPM any files you want to manage directly via Puppet.

Bottom line: isolate the node-specific bits in a small number of small files, and manage those files with File resources. Manage the bulk content that is common to all nodes via one of the other mechanisms.

----

If the various web sites you want to manage were substantially different from each other, on the other hand, then yes, it would make sense to package each in its own RPM.

John

Sergiu Cornea

unread,

Sep 1, 2015, 3:55:56 AM9/1/15

to Puppet Users

Hi John,

Thank you for your answer.

The files I am speaking about are the Linux Container files such as the /lib64, /usr and /var directories. Therefore, the choice of making RPMs as I have to copy all those files for each website using Puppet.

Thank you,

Kind regards,

Sergiu

Reply all

Reply to author

Forward