unarchive is slow to decompress very big tar.gz file

1,480 views
Skip to first unread message

Check Peck

unread,
Feb 28, 2018, 12:24:23 AM2/28/18
to Ansible Project
I have a below playbook which I am trying to run but it takes forever. Size of "test.tar.gz" is 100GB.

---
- hosts: TEST_BOX
  serial: 1
  tasks:
      - name: copy and untar latest tar.gz file
        unarchive: src=test.tar.gz dest=/data/tasks/files/

I am using ansible 2.4.3.0. Any thoughts how can I make it faster?

Check Peck

unread,
Mar 1, 2018, 3:20:45 PM3/1/18
to Ansible Project
Does anyone have any thoughts on this?

Kai Stian Olstad

unread,
Mar 1, 2018, 3:26:05 PM3/1/18
to ansible...@googlegroups.com
On Thursday, 1 March 2018 21.20.44 CET Check Peck wrote:
> Does anyone have any thoughts on this?

Maybe.


> On Tuesday, February 27, 2018 at 9:24:23 PM UTC-8, Check Peck wrote:
> >
> > I have a below playbook which I am trying to run but it takes forever.
> > Size of "test.tar.gz" is 100GB.

What is taking forever, the copying or the untar-ing?


> > ---
> > - hosts: TEST_BOX
> > serial: 1
> > tasks:
> > - name: copy and untar latest tar.gz file
> > unarchive: src=test.tar.gz dest=/data/tasks/files/
> >
> >
> > I am using ansible 2.4.3.0. Any thoughts how can I make it faster?

It depends on where the bottle neck is.
Your are copying from your client, if that has a slower connection you could put the file on another machine/server and get the file from there instead.


--
Kai Stian Olstad

Check Peck

unread,
Mar 2, 2018, 1:01:12 PM3/2/18
to Ansible Project
Is there any way to avoid copying? Can we not just untar over the wire if there is any way?

Kai Stian Olstad

unread,
Mar 2, 2018, 1:18:40 PM3/2/18
to ansible...@googlegroups.com
On Friday, 2 March 2018 19.01.12 CET Check Peck wrote:
> Is there any way to avoid copying? Can we not just untar over the wire if
> there is any way?

Not with unarchive.

If you put the file on a web server you could do this

- shell: curl -s https://some.url/test.tar.gz | tar xzC /dest

But this solution is not idempotent

--
Kai Stian Olstad

Check Peck

unread,
Mar 2, 2018, 2:54:16 PM3/2/18
to Ansible Project
Ok got it. But if I try this way by using shell module:

---
- hosts: TEST_BOX
  serial: 1
  tasks:
      - name: copy and untar latest deals.tar.gz file
        shell: "cd /data/tasks/files/; tar -xvzf test.tar.gz"

It doesn't work and I get an error like this: Am I doing anything wrong with my above tasks?

fatal: [machine_abc]: FAILED! => {"changed": true, "cmd": "cd /data/tasks/files/; tar -xvzf test.tar.gz", "delta": "0:00:00.022610", "end": "2018-03-02 12:47:56.840245", "msg": "non-zero return code", "rc": 2, "start": "2018-03-02 12:47:56.817635", "stderr": "tar (child): test.tar.gz: Cannot open: No such file or directory\ntar (child): Error is not recoverable: exiting now\ntar: Child returned status 2\ntar: Error is not recoverable: exiting now", "stderr_lines": ["tar (child): test.tar.gz: Cannot open: No such file or directory", "tar (child): Error is not recoverable: exiting now", "tar: Child returned status 2", "tar: Error is not recoverable: exiting now"], "stdout": "", "stdout_lines": []}

Kai Stian Olstad

unread,
Mar 2, 2018, 3:24:07 PM3/2/18
to ansible...@googlegroups.com
On Friday, 2 March 2018 20.54.15 CET Check Peck wrote:
> Ok got it. But if I try this way by using shell module:
>
> ---
> - hosts: TEST_BOX
> serial: 1
> tasks:
> - name: copy and untar latest deals.tar.gz file
> shell: "cd /data/tasks/files/; tar -xvzf test.tar.gz"
>
>
> It doesn't work and I get an error like this: Am I doing anything wrong
> with my above tasks?
>
> fatal: [machine_abc]: FAILED! => {"changed": true, "cmd": "cd
> /data/tasks/files/; tar -xvzf test.tar.gz", "delta": "0:00:00.022610",
> "end": "2018-03-02 12:47:56.840245", "msg": "non-zero return code", "rc":
> 2, "start": "2018-03-02 12:47:56.817635", "stderr": "tar (child):
> test.tar.gz: Cannot open: No such file or directory\ntar (child): Error is
> not recoverable: exiting now\ntar: Child returned status 2\ntar: Error is
> not recoverable: exiting now", "stderr_lines": ["tar (child): test.tar.gz:
> Cannot open: No such file or directory", "tar (child): Error is not
> recoverable: exiting now", "tar: Child returned status 2", "tar: Error is
> not recoverable: exiting now"], "stdout": "", "stdout_lines": []}

You need to transfer the file to the host first.
So that's why I in my example used curl, it will get the file and then send them in memory to tar that untar the files and save them to disk.


--
Kai Stian Olstad

Yevgen Lasman

unread,
Jun 3, 2018, 9:47:19 AM6/3/18
to Ansible Project
It have the same problem and I noticed that for my .tar.bz2 archove the next command is called
/bin/gtar --list -C /data/sync --show-transformed-names --use-compress-program=pbzip2 --exclude=data/tmp -f /tmp/data-backup.tar.bz2

and it indeed takes twice more time presumably because of --list option.

The playbook part is 

- name: Unpack data dump
  unarchive
:
    remote_src
: yes
    src
: /tmp/data-backup.tar.bz2
    dest
: "{{ destination }}/sync"
    extra_opts
: ["--use-compress-program=pbzip2","--exclude=data/tmp"]
  tags
:
   
- restore
   
- data-restore

and having --list in the comman line does not make any sense, since I didn't call it!

Yevgen Lasman

unread,
Jun 3, 2018, 9:55:59 AM6/3/18
to Ansible Project
And the next command it calls is!

/bin/gtar --diff -C /data/sync --show-transformed-names --use-compress-program=pbzip2 --exclude=data/tmp -f /tmp/data-backup.tar.bz2


Reply all
Reply to author
Forward
0 new messages