syncronizing many files ( more than 100,000 ) with "puppet agent -t" is slow

61 views
Skip to first unread message

Seokhee Kim

unread,
Nov 17, 2014, 6:32:44 PM11/17/14
to puppet...@googlegroups.com
Hi,
We have some special case to use Puppet which need to sync many and many files but just few files are updated ( delete, created or updated ).
Is there any way to sync just for changed files? now I am syncing whole dir with purge option.
And I realized it's getting slower and slower based on number of files.
So it looks like syncing changed list would be good idea.

Thanks,
Seokhee

Trevor Vaughan

unread,
Nov 17, 2014, 8:41:56 PM11/17/14
to puppet...@googlegroups.com
Hi Seokhee,

You might want to move to using rsync for this type of activity https://github.com/onyxpoint/pupmod-rsync.

Thanks,

Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/4abaab77-6434-4f0a-ab7e-e182c29d6d88%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

jcbollinger

unread,
Nov 18, 2014, 1:55:58 PM11/18/14
to puppet...@googlegroups.com
The cost of such an operation must scale with the number of files, unless you know in advance of each run, through some out-of-band mechanism, what updates need to be performed.  If you don't know that then you have no alternative but to touch or at least examine every one, either to determine whether it needs to be updated or to blindly overwrite it.  You can make that *cheaper* by various means, such as changing the Files' 'checksum' parameter or managing the files via rsync instead, but such options still scale the same way.

There is in fact a way to scale with the number of changes instead of with the number of files, however: manage the files via a version-control system (git, svn, mercurial, etc.) instead of directly via Puppet.  This has overhead in the form of disk space, of course, and Puppet will not be able to report as precisely on changes performed.  Also you will not be able to have Puppet relationships with individual files from the collection, because they are not represented to Puppet as individual resources.


John

Seokhee Kim

unread,
Nov 18, 2014, 6:50:05 PM11/18/14
to puppet...@googlegroups.com
Thanks for information.
I just did test with Enterprise version on same machine and got surprised.
The Enterprise version, it took 2 minutes with 10,000 files.
Open source version, it took 12 mins.
I guess I need some fine tuning on open source version. but need to figure it out.
I will try the rsync and post the result.

Thanks,
Seokhee


Or is there any better ideas for syncing about 100,000 files?
It's too slow. I mean it takes more than 1 hour. we need it less than 3 mins for live update.

Thanks,
Seokhee

Seokhee Kim

unread,
Nov 19, 2014, 2:07:19 AM11/19/14
to puppet...@googlegroups.com
Ok, it seems the rsync is the way to go. It took 18 seconds for 10,000 files.

Thanks,
Seokhee

waz0wski

unread,
Nov 21, 2014, 3:07:53 PM11/21/14
to puppet...@googlegroups.com
+1 on using a VCS system like GIT. Puppet is not the tool for doing this
kind of operation.

The vcsrepo puppet module can be used to checkout / update a repo of
files, and as jcbollinger said, that will scale well with changed files
individually versus re-deploy or rsync of all files.

https://forge.puppetlabs.com/puppetlabs/vcsrepo
Reply all
Reply to author
Forward
0 new messages