HTTP as a source for files

199 views
Skip to first unread message

Greg

unread,
Jun 30, 2009, 10:14:36 PM6/30/09
to Puppet Users
Hi all,

I've been looking into having Puppet deploy some larger files and I'm
noticing that it ties up puppetmasters quite a bit and can often
result in a timeout if the file is too large. Before I submit a
feature request for a http method for file sources, I would throw it
out to the group and see if anyone had any thoughts on it.

Possible benefits of a HTTP source could be shifting some filefile
serverd from Puppet on larger installs (especially useful if you need
to deploy larger files like packages). Another possible benefit could
be the ability to generate files using sources external to Puppet -
ie. cgi script on remote server generates config file (yes, I know you
could do it with a template, but maybe this isn't good in all
cases...)

The main question would be in terms of how to detect file changes
without a full transfer - HTTP does provide some mechanisms for
checking this, but I'm not sure if they would be adequate if scripting
responses through HTTP...

My main reason for wanting this is to allow me to deploy files through
firewalls where NFS is not possible (ie. DMZ) and Puppet is not able
to keep up...

For those wanting the specific example, I'm deploying Sun Explorer
through the install_stb.sh file (30+ Mb.) then executing it to install
the packages. The install file fails to download when using "puppet:///
explorer/install_stb.sh" as a source as it takes too long and Puppet
chops it off as a timeout.

Yes, I know I can do it through a package resource, but the SUNWsneep
component won't install then - running "pkgadd -d http://server/SUNWsneep.pkg
-a adminfile -r responsefile" does not work - pkgadd refuses to do a
response file with a http datastream.

So what does everyone think? Is a HTTP source for files feasable?

Peter Meier

unread,
Jul 1, 2009, 3:19:38 AM7/1/09
to puppet...@googlegroups.com
Hi

> I've been looking into having Puppet deploy some larger files and I'm
> noticing that it ties up puppetmasters quite a bit and can often
> result in a timeout if the file is too large. Before I submit a
> feature request for a http method for file sources, I would throw it
> out to the group and see if anyone had any thoughts on it.

yes, this is the main reason why >= 0.25.0 will use REST over XMLRPC,
which requires to escape all the data of files into an xml format.

The current rule of thumb is to not deploy larger files with puppet <
0.25.0 . :(

> [...]


>
> So what does everyone think? Is a HTTP source for files feasable?

as far as I understood with REST this all would be possible and Luke
explicitly mentioned that it would even be possible to natively serve
files by apache (for example), so no ruby stack overhead at all.
However I didn't yet see any example which do that, nor how to setup.
But it's definately already the idea, if you don't even find a ticket
for that.

Maybe try out 0.25.0 beta 2 to see if it works better, it definitely
should and according to reports it does!

cheers pete

Marc Fournier

unread,
Jul 1, 2009, 4:06:48 AM7/1/09
to puppet...@googlegroups.com

Hello,

> I've been looking into having Puppet deploy some larger files and I'm
> noticing that it ties up puppetmasters quite a bit and can often
> result in a timeout if the file is too large. Before I submit a
> feature request for a http method for file sources, I would throw it
> out to the group and see if anyone had any thoughts on it.
>

> [...]

I'm convinced we could benefit from having other file sources than
file:// and puppet://. There already is a (similar) ticket for this:
http://projects.reductivelabs.com/issues/184

You might also be interested by Luke Kanies's reply to more or less the
same question on puppet-dev a few weeks ago:
http://groups.google.com/group/puppet-dev/browse_thread/thread/275658354cd45bab/60b7672fbc35c371

I've started working on this (but unfortunately got preempted and now
stalled). It shouldn't be too difficult to implement, but as far as I'm
concerned, my knowledge of ruby is currently too low to do this
efficiently :-(

Marc


Julian Simpson

unread,
Jul 1, 2009, 4:48:14 AM7/1/09
to puppet...@googlegroups.com
I like the idea of HTTP if it gets me closer to stubbing out the
puppetmaster when I'm developing manifests. Thinking I could stand up
a webrick server to resolve all the file sources. Of course, I'd use
Apache or Nginx in production.

J.

2009/7/1 Marc Fournier <marc.f...@camptocamp.com>:
--
Julian Simpson
Software Build and Deployment
http://www.build-doctor.com

Robin Sheat

unread,
Jul 1, 2009, 6:13:35 AM7/1/09
to puppet...@googlegroups.com
On Wednesday 01 July 2009 14:14:36 Greg wrote:
> The main question would be in terms of how to detect file changes
> without a full transfer - HTTP does provide some mechanisms for
> checking this, but I'm not sure if they would be adequate if scripting
> responses through HTTP...

I use S3 as a file source for my larger files, it allows contents to be
verified by MD5. My code for this is available here:
https://code.launchpad.net/~eythian/+junk/ec2facts
it's pretty basic, but gets the job done.

I mention this because a similar approach should be usable when backing with
HTTP and Apache. You could either do a HEAD request with 'If-Modified-Since',
and ensure that when you save the file, you update the file timestamp to that
supplied by apache, or check to see if apache will provide the MD5 (or
whatever) hash of the file contents. If the HEAD request indicates that there
is an updated version, then you pull it down using wget or similar.

--
Robin <ro...@kallisti.net.nz> JabberID: <eyt...@jabber.kallisti.net.nz>
http://www.kallisti.net.nz/blog ||| http://identi.ca/eythian

PGP Key 0xA99CEB6D = 5957 6D23 8B16 EFAB FEF8 7175 14D3 6485 A99C EB6D

signature.asc

David Schmitt

unread,
Jul 1, 2009, 6:17:03 AM7/1/09
to puppet...@googlegroups.com
Robin Sheat wrote:
> On Wednesday 01 July 2009 14:14:36 Greg wrote:
>> The main question would be in terms of how to detect file changes
>> without a full transfer - HTTP does provide some mechanisms for
>> checking this, but I'm not sure if they would be adequate if scripting
>> responses through HTTP...
>
> I use S3 as a file source for my larger files, it allows contents to be
> verified by MD5. My code for this is available here:
> https://code.launchpad.net/~eythian/+junk/ec2facts
> it's pretty basic, but gets the job done.
>
> I mention this because a similar approach should be usable when backing with
> HTTP and Apache. You could either do a HEAD request with 'If-Modified-Since',
> and ensure that when you save the file, you update the file timestamp to that
> supplied by apache, or check to see if apache will provide the MD5 (or
> whatever) hash of the file contents. If the HEAD request indicates that there
> is an updated version, then you pull it down using wget or similar.

The two classical approaches to this are either properly configured ETag
support or using the checksum as part of the filename and never refetch
a file unless its filename has changed.


Regards, DavidS

Greg

unread,
Jul 1, 2009, 11:39:32 PM7/1/09
to Puppet Users
Just did a quick search. Looks like you can put an MD5 checksum into
the headers
with Apache quite easily: http://httpd.apache.org/docs/2.2/mod/core.html#contentdigest

Haven't played with it yet, but the doco does indicate a bit of a
performance hit as it
doesn't cache the checksums... Not suprising since content could be
dynamically
generated.

Greg
Reply all
Reply to author
Forward
0 new messages