Downloading files with a specific regular expression from remote https host to target server local path.

74 views
Skip to first unread message

Tom K.

unread,
Mar 26, 2020, 2:24:26 PM3/26/20
to Ansible Project
Having some difficulty finding what is the Ansible code to do this:

Downloading files with a specific regular expression from remote http / https host to a target server local path.


So would like to wget http://remote.com/rpm/app*.rpm to a set of target servers.  get_url doesn't appear to support patterns as per Ansible Documentation.  On the other hand, with_fileglob supports regular expressions and patterns but doesn't work with HTTP.  


Looking for suggestions.


Cheers,

TK

Dick Visser

unread,
Mar 26, 2020, 7:15:10 PM3/26/20
to ansible...@googlegroups.com
It's fundamentally impossible to do what you want, unless the remote
server offers some sort of file system equivalent, like a directory
index.

Dick
> --
> You received this message because you are subscribed to the Google Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/42eee6fe-2134-4fd0-aa29-fc0571e9a249%40googlegroups.com.



--
Dick Visser
Trust & Identity Service Operations Manager
GÉANT

Tom K.

unread,
Mar 27, 2020, 9:09:40 AM3/27/20
to Ansible Project
Thanks Dick!

I've started to get that impression after searching for quite some time.  

Currently using a shell command like this to get only specific files down:

wget -r -nd --no-parent -A '*pattern*' http://site.com/path/to/file/  

Hence why I was thinking it might be possible in Ansible.  

Cheers,
TK


On Thursday, March 26, 2020 at 7:15:10 PM UTC-4, Dick Visser wrote:
It's fundamentally impossible to do what you want, unless the remote
server offers some sort of file system equivalent, like a directory
index.

Dick

On Thu, 26 Mar 2020 at 19:24, Tom K. <tom...@microdevsys.com> wrote:
>
> Having some difficulty finding what is the Ansible code to do this:
>
> Downloading files with a specific regular expression from remote http / https host to a target server local path.
>
>
> So would like to wget http://remote.com/rpm/app*.rpm to a set of target servers.  get_url doesn't appear to support patterns as per Ansible Documentation.  On the other hand, with_fileglob supports regular expressions and patterns but doesn't work with HTTP.
>
>
> Looking for suggestions.
>
>
> Cheers,
>
> TK
>
> --
> You received this message because you are subscribed to the Google Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ansible...@googlegroups.com.

Dick Visser

unread,
Mar 27, 2020, 1:52:31 PM3/27/20
to ansible...@googlegroups.com
The recursive (-r) option of wget only downloads files that are 'visible'.
This works fine for stuff like a web page with indexed directory listings etc.
But anything that is not listed won't be magically retrieved.
If a site does not contain any links to content that is actually there, wget will not know about it and hence won't download it.

If wget works, then http://remote.com/rpm must have links to all the files.
So you best bet is to use the ansible command module with said wget options - provided you want to use ansible.

Having said that, maybe you can elaborate on what the underlying task at hand is, and/or share the real/actual URLs etc.
It might be possible to achieve the same thing in a different way.

Dick



To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/f57ddea1-b412-472f-977c-b933f97ef43f%40googlegroups.com.

Tom K.

unread,
Mar 27, 2020, 3:25:57 PM3/27/20
to Ansible Project
Let's assume a real site:



So I used the shell command to run the wget, but I want to see if there is a pure ansible way of doing this.  

Example:

# wget -r -nd --no-parent -A '*glib*' http://mirror.centos.org/centos/7/os/x86_64/Packages/

# ls -altri|grep -v glib
total
72652
201326721 dr-xr-x---. 9 root root     4096 Mar 27 15:04 ..
135239432 drwxr-xr-x. 2 root root     8192 Mar 27 15:07 .
#


When I try to list NON-glib files above using grep -v, I get nothing because wget only downloaded files with *glib* in it.  (I think it downloads all files and removes those without *glib* in it)  

My use case.  I have files such as this on some HTTP link:

server1_file1.1
server1_file-m2
server1_file-n3

server2_file1.1.1
server2_file-a2
server2_file-z3

server3_file1.1.1.1
server3_file-abc2
server3_file-xyz3

I want to write Ansible code that will grab the correct set of files fo each host.  So for server3, it will only grab server3 files from said HTTP link.  Assume I absolutely need to use HTTP as the source of said files. 

Cheers,
TK

oxido A

unread,
Mar 27, 2020, 3:51:56 PM3/27/20
to ansible...@googlegroups.com
well may you will need to read the output of curl and then grep over some html tags...

I do some like this :
curl <someUrl> grep -o -E 'href="([^"*]+)"' | cut -d '"' -f 2 | sort -n | sed -e 's/\///' | tail -1

this wil give you el name of the last  file in the list of  href="([^"*]+ in my case there are links....

To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/5736baef-e4e7-45cd-91f0-5b571a5c2d24%40googlegroups.com.


--

     _             _
             //             \\
            /'               `\
           /,'     ..-..     `.\
          /,'   .''     ``.   `.\
         /,'   :   .---.   :   `.\
        I I   :  .'\   /`.  :   I I
        I b__:   . .`~'. .   :__d I
        I p~~:   . `._.' .   :~~q I
        I I   :   ./   \.   :   I I
         \`.   :   `---'   :   ,'/
          \`.   `..     ..'   ,'/
           \`.     ``~''     ,'/
            \`               '/   
             \\             //
              ~             ~

Dick Visser

unread,
Mar 27, 2020, 4:16:23 PM3/27/20
to ansible...@googlegroups.com
if those files are tailor made for you then I would work with the team providing them to find an alternative way of doing this. 


What if they created a json file containing all the file information and key that with the host info, after the files are generated?




To unsubscribe from this group and stop receiving emails from it, send an email to ansible-proje...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/5736baef-e4e7-45cd-91f0-5b571a5c2d24%40googlegroups.com.
--
Sent from a mobile device - please excuse the brevity, spelling and punctuation.
Reply all
Reply to author
Forward
0 new messages