Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

remote-copy module?

2,333 views
Skip to first unread message

Darren Chamberlain

unread,
Jul 19, 2012, 8:27:34 AM7/19/12
to ansible...@googlegroups.com
Is there an existing module that works similar to copy but allows
one to use http resources instead of local files? Something like

$ ansible all -m remote-copy -a 'src=http://internal.host/status-dashboard.txt dest=/etc/motd'

--
Darren Chamberlain <dar...@boston.com>

Michael DeHaan

unread,
Jul 19, 2012, 8:30:21 AM7/19/12
to ansible...@googlegroups.com

There is the git module, which does get checkouts.

sftp, ftp, and http:// all in one would be slick!
> Darren Chamberlain <dar...@boston.com (mailto:dar...@boston.com)>



Jan-Piet Mens

unread,
Jul 19, 2012, 8:48:54 AM7/19/12
to ansible...@googlegroups.com
> sftp, ftp, and http:// all in one would be slick!

I can give http:// a shot with httplib [1]... Shall I try? That should
be available in most installations, I think.

-JP

[1] http://docs.python.org/library/httplib.html

Michael DeHaan

unread,
Jul 19, 2012, 8:50:38 AM7/19/12
to ansible...@googlegroups.com
Sounds great!

Darren Chamberlain

unread,
Jul 19, 2012, 8:55:07 AM7/19/12
to ansible...@googlegroups.com
I looked at the git module for this, but my use case is more
expansive: I am investigating using ansible to replace our
masterless puppet infrastructure, which makes extensive use of file
resources, but we are limited (almost hamstrung) by puppet's
requirement that file sources be on the local file system. We have
about 1G worth of files, mostly binaries, almost all tiny, some
which churn quite a bit, and most of which come from third party
vendors, so using a VCS isn't a good match for this. What I would
need is the ability to put our files behind a fast but dumb
load-balanced httpd instance.

My main concern about a module of this type are performace- and
efficient-related: every invocation would require either pulling
down the file to compare with the local copy, or that the remote
server be able to transmit the checksum to the client (which makes
the server less dumb). It would be nice to support all the options
that copy supports (like first_available_file) but that might not be
reasonable.

If other people see value in a module like this, I can attempt to
create one, although I would have to strongly resist the temptation
to make it simply do os.system("wget") ...

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 08:30]:

nix85

unread,
Jul 19, 2012, 9:03:05 AM7/19/12
to ansible...@googlegroups.com
Another point to keep in mind here is that what sort of files would you allow to be copied? Are you going to be filtering based on the Content-Type http header?

For example, will the module just get the response of the HTTP call and if it is 200 OK, will it simply create a file on the remote destination with the contents as that of the resource? Or, are you going to allow only certain file types?

Jeroen Hoekx

unread,
Jul 19, 2012, 9:03:47 AM7/19/12
to ansible...@googlegroups.com
Hi,

On 19 July 2012 14:55, Darren Chamberlain <dar...@boston.com> wrote:

> My main concern about a module of this type are performace- and
> efficient-related: every invocation would require either pulling
> down the file to compare with the local copy, or that the remote
> server be able to transmit the checksum to the client (which makes
> the server less dumb). It would be nice to support all the options
> that copy supports (like first_available_file) but that might not be
> reasonable.

HTTP has Etags for just this. When you GET it, the server returns the
Etag of the resource. If you add the if-no-match header with the
previously sent Etag, the server will respond with a 304.

Maybe the module should store the Etag of downloaded resources in
/var/lib/ansible?

Greetings,

Jeroen

nix85

unread,
Jul 19, 2012, 9:09:32 AM7/19/12
to ansible...@googlegroups.com
Why not just write a rsync module and use it with ansible to sync directories? You can have one source of truth server that has all the files in one place (of course with proper backup and RAID). You can then write a module that runs a rsync command locally on that host.

Basically you can have an rsync module that you can probable use as follows
===
ansible webservers -m rsync -a "src=/path/to/src dest=/path/to/dest" -c local
===

Or, you can use the command module to run rsync command
===
ansible webservers -m command -a "rsync src/path dest/path" -c local
===

Please correct me if I am wrong.

Darren Chamberlain

unread,
Jul 19, 2012, 9:11:21 AM7/19/12
to ansible...@googlegroups.com
My gut feeling on this is that for 200s the content gets written,
for 304s the content is unchanged, 30x redirects are followed based
on a parameter (follow=true/false), and everything else is treated
as an error (in the same way that copy chokes if the src= is
missing). But there are a huge number of potential parameters for a
module like this: ssl/cert handling, retries, timeouts, following
redirects to untrusted domains, content-type filtering, etc.

* nix85 <firestarter.985 at gmail.com> [2012/07/19 06:03]:
--
Darren Chamberlain <dar...@boston.com>

Darren Chamberlain

unread,
Jul 19, 2012, 9:15:01 AM7/19/12
to ansible...@googlegroups.com
* Jeroen Hoekx <jeroen.hoekx at hamok.be> [2012/07/19 15:03]:
Yes, etags and if-no-match are definitely a good idea. I was
thinking more of the case where these aren't sent as part of the
response, or aren't honored as part of the request; in a module like
this, which would request arbitrary content from arbitrary servers,
you have to be defensive about everything and assume the worst.

--
Darren Chamberlain <dar...@boston.com>

Michael DeHaan

unread,
Jul 19, 2012, 9:16:22 AM7/19/12
to ansible...@googlegroups.com
I almost think that since there are so many possible ways to do this the shell module calling curl with the creates= parameter might be the way to go.

Though you don't know if the file is too old or not, using wget/curl to simulate yum seems to be a bad course to take, and we also have the git module.

What use case am I missing?

On Thursday, July 19, 2012 at 9:11 AM, Darren Chamberlain wrote:
> My gut feeling on this is that for 200s the content gets written,
> for 304s the content is unchanged, 30x redirects are followed based
> on a parameter (follow=true/false), and everything else is treated
> as an error (in the same way that copy chokes if the src= is
> missing). But there are a huge number of potential parameters for a
> module like this: ssl/cert handling, retries, timeouts, following
> redirects to untrusted domains, content-type filtering, etc.
>
> * nix85 <firestarter.985 at gmail.com (http://gmail.com)> [2012/07/19 06:03]:
> > Another point to keep in mind here is that what sort of files
> > would you allow to be copied? Are you going to be filtering based
> > on the Content-Type http header?
> >
> > For example, will the module just get the response of the HTTP
> > call and if it is 200 OK, will it simply create a file on the
> > remote destination with the contents as that of the resource? Or,
> > are you going to allow only certain file types?
> >
> > On Thursday, July 19, 2012 5:57:34 PM UTC+5:30, Darren Chamberlain wrote:
> > >
> > > Is there an existing module that works similar to copy but
> > > allows one to use http resources instead of local files?
> > > Something like
> > >
> > > $ ansible all -m remote-copy -a 'src=http://internal.host/status-dashboard.txt dest=/etc/motd'
>
> --
> Darren Chamberlain <dar...@boston.com (mailto:dar...@boston.com)>



Darren Chamberlain

unread,
Jul 19, 2012, 9:23:19 AM7/19/12
to ansible...@googlegroups.com
Yes, the shell module + creates= is definitely a possibility for
this, and I didn't think of it when I was formulating my use case
and initial question. The only thing it misses is the ability to do
things like first_available_file, but that's not a deal-breaker for
me.

* Michael DeHaan <michael.dehaan at gmail.com> [2012/07/19 09:16]:
> I almost think that since there are so many possible ways to do
> this the shell module calling curl with the creates= parameter
> might be the way to go.
>
> Though you don't know if the file is too old or not, using
> wget/curl to simulate yum seems to be a bad course to take, and we
> also have the git module.
>
> What use case am I missing?

--
Darren Chamberlain <dar...@boston.com>

Michael DeHaan

unread,
Jul 19, 2012, 9:24:34 AM7/19/12
to ansible...@googlegroups.com
first_available_file looks at the local filesystem anyway, so it works exactly the same for all modules. It would not be able to talk to a remote resource.


On Thursday, July 19, 2012 at 9:23 AM, Darren Chamberlain wrote:

> Yes, the shell module + creates= is definitely a possibility for
> this, and I didn't think of it when I was formulating my use case
> and initial question. The only thing it misses is the ability to do
> things like first_available_file, but that's not a deal-breaker for
> me.
>
> * Michael DeHaan <michael.dehaan at gmail.com (http://gmail.com)> [2012/07/19 09:16]:
> > I almost think that since there are so many possible ways to do
> > this the shell module calling curl with the creates= parameter
> > might be the way to go.
> >
> > Though you don't know if the file is too old or not, using
> > wget/curl to simulate yum seems to be a bad course to take, and we
> > also have the git module.
> >
> > What use case am I missing?
>
> --
> Darren Chamberlain <dar...@boston.com (mailto:dar...@boston.com)>



nix85

unread,
Jul 19, 2012, 10:04:58 AM7/19/12
to ansible...@googlegroups.com
I would think that rsync would be a lot more efficient in achieving what you need. Use rsync with the shell module or write your own module to sync files across.

Timothy Appnel

unread,
Jul 19, 2012, 11:13:58 AM7/19/12
to ansible...@googlegroups.com, ansible...@googlegroups.com
I've dealt with ETags and If-Modified headers quite a bit in a past life. I've found that most web servers provide one and usually both.

It's always good to be defensive, but this may be a bit premature to worry about. Hit some of the servers you'll be working with and check the headings to see if those tags are present. I'm guessing they will be.

I'd also suggest looking at compressed (gzip) content. With a bit of configuration Apache (I believe any modern webserver really) can be configured to transparently gzip files and send those to clients that say they will accept them.

http://www.diveintopython.net/http_web_services/gzip_compression.html
http://www.diveintopython.net/http_web_services/http_features.html#d0e27724

<tim/>

Sent from my iPhone

Timothy Appnel

unread,
Jul 19, 2012, 11:36:57 AM7/19/12
to ansible...@googlegroups.com, ansible...@googlegroups.com
Funny you mention it Darren. I was just thinking about this last night.

My reasoning is different than Darren's though. I'm writing out a whole lot of interconnected configuration files that sit in various subdirectories and want to copy a (potential) batch of them to remote servers.

The copy module or even this hypothetical remote copy module will work great with a file or directory or a static list thereof. That's not what I'm dealing with though.

These generated files don't make sense in version control because they can be reproduced easily.

In the past I ran a rsync command and let it do it's magic creating paths & files, using compression and selective file transfers.

While I could use the command module for rsync, I like having a module that can apply some smart defaults and return better information.

Seth Vidal

unread,
Jul 19, 2012, 5:29:24 PM7/19/12
to ansible...@googlegroups.com



On Thu, 19 Jul 2012, Darren Chamberlain wrote:

> I looked at the git module for this, but my use case is more
> expansive: I am investigating using ansible to replace our
> masterless puppet infrastructure, which makes extensive use of file
> resources, but we are limited (almost hamstrung) by puppet's
> requirement that file sources be on the local file system. We have
> about 1G worth of files, mostly binaries, almost all tiny, some
> which churn quite a bit, and most of which come from third party
> vendors, so using a VCS isn't a good match for this. What I would
> need is the ability to put our files behind a fast but dumb
> load-balanced httpd instance.
>
> My main concern about a module of this type are performace- and
> efficient-related: every invocation would require either pulling
> down the file to compare with the local copy, or that the remote
> server be able to transmit the checksum to the client (which makes
> the server less dumb). It would be nice to support all the options
> that copy supports (like first_available_file) but that might not be
> reasonable.
>
> If other people see value in a module like this, I can attempt to
> create one, although I would have to strongly resist the temptation
> to make it simply do os.system("wget") ...
>

So it sounds to me like you have a master source for all these files.

IF so - why not setup all your clients like read-only backup clients.
then, with ansible provision them with a set of files that they actually
need/care about.

And restore them into place.

You could do ALL of that as a module, if you wanted or using just the
command module.

-sv

Seth Vidal

unread,
Jul 19, 2012, 5:33:38 PM7/19/12
to ansible...@googlegroups.com



On Thu, 19 Jul 2012, Darren Chamberlain wrote:

> Is there an existing module that works similar to copy but allows
> one to use http resources instead of local files? Something like
>
> $ ansible all -m remote-copy -a 'src=http://internal.host/status-dashboard.txt dest=/etc/motd'


Also - this
http://fedorapeople.org/cgit/skvidal/public_git/scripts.git/tree/copy_if_changed.py

could be adapted pretty easily.
it uses urlgrabber but swapping that out for urllib[2] wouldn't be
terrible.


-sv

Jan-Piet Mens

unread,
Jul 20, 2012, 7:03:44 AM7/20/12
to ansible...@googlegroups.com
> Sounds great!

OK, I dared. Pull request [1] is en-route to you.

This was exciting ... ;-)
Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!

-JP


[1] https://github.com/ansible/ansible/pull/634

Michael DeHaan

unread,
Jul 20, 2012, 7:15:30 AM7/20/12
to ansible...@googlegroups.com


On Friday, July 20, 2012 at 7:03 AM, Jan-Piet Mens wrote:

> > Sounds great!
>
>
>
> OK, I dared. Pull request [1] is en-route to you.
>
> This was exciting ... ;-)
> Oh, and I'm using your new MODULE_MAGIC thingy, which ROCKS!


Glad to here it and very cool.

(I made a few comments about tweaks, nothing major as this looked good)

I want to get a little bit more feedback about the module import thing before we commit to it as an API, but if so, you can just resubmit it in a bit.

I want to avoid having to change all the modules if we change the API signature. If I don't hear anything today, we will consider it final, and can add functions, but will not take any away.


>
> -JP
>
>
> [1] https://github.com/ansible/ansible/pull/634


Reply all
Reply to author
Forward
0 new messages