URGENT: cannot complete wal-e push after disk issues

126 views
Skip to first unread message

Ben Hitz

unread,
Apr 29, 2016, 7:16:32 PM4/29/16
to wal-e

So, I let my disk fill up (shame!) on my server.  I stopped and increased EBS volume size, so plenty of space now.

However, my wal-e pushes are failing; I have about 8 files left in my pg_xlog:
Critical message:

lzop: No space left on device: <stdout>


Here is what I get if I run manually (similar messages in logs every minute or so):

I guess my plan would be manually lzop the files and s3 copy them?  Then shut this bad boy down and switch to new server?

postgres@ip-172-31-33-208:~/9.3/main$ /opt/wal-e/bin/envfile --config ~postgres/.aws/credentials --section default --upper -- /opt/wal-e/bin/wal-e --s3-prefix="$(cat /etc/postgresql/9.3/main/wale_s3_prefix)" wal-push "pg_xlog/00000014000006F6000000D6"

wal_e.main   INFO     MSG: starting WAL-E

        DETAIL: The subcommand is "wal-push".

        STRUCTURED: time=2016-04-29T22:54:44.566111-00 pid=5623

wal_e.worker.upload INFO     MSG: begin archiving a file

        DETAIL: Uploading "pg_xlog/00000014000006F6000000D6" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D6.lzo".

        STRUCTURED: time=2016-04-29T22:54:44.578467-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D6.lzo prefix=production/ seg=00000014000006F6000000D6 state=begin

wal_e.worker.upload INFO     MSG: begin archiving a file

        DETAIL: Uploading "pg_xlog/00000014000006F6000000D7" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D7.lzo".

        STRUCTURED: time=2016-04-29T22:54:44.580494-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D7.lzo prefix=production/ seg=00000014000006F6000000D7 state=begin

wal_e.worker.upload INFO     MSG: begin archiving a file

        DETAIL: Uploading "pg_xlog/00000014000006F6000000D8" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D8.lzo".

        STRUCTURED: time=2016-04-29T22:54:44.583706-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D8.lzo prefix=production/ seg=00000014000006F6000000D8 state=begin

lzop: No space left on device: <stdout>

lzop: No space left on device: <stdout>

lzop: No space left on device: <stdout>

Traceback (most recent call last):

  File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

    result = self._run(*self.args, **self.kwargs)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

    self.gpg_key_id)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

    pass

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

    command.finish()

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

    .format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.681728-00 pid=5623

<Greenlet at 0x7ff6a7b92730: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical


Traceback (most recent call last):

  File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

    result = self._run(*self.args, **self.kwargs)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

    self.gpg_key_id)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

    pass

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

    command.finish()

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

    .format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.684522-00 pid=5623

<Greenlet at 0x7ff6a7b92cd0: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical


Traceback (most recent call last):

  File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

    result = self._run(*self.args, **self.kwargs)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

    self.gpg_key_id)

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

    pass

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

    command.finish()

  File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

    .format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.685912-00 pid=5623

<Greenlet at 0x7ff6a7b92c30: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical


wal_e.main   CRITICAL MSG: pipeline process did not exit gracefully

        DETAIL: "lzop -c" had terminated with the exit status 1.

        STRUCTURED: time=2016-04-29T22:54:44.686046-00 pid=5623



Here is my disk usage.



postgres@ip-172-31-33-208:~/9.3/main$ df -H

Filesystem      Size  Used Avail Use% Mounted on

udev             32G   13k   32G   1% /dev

tmpfs           6.4G  377k  6.4G   1% /run

/dev/xvda1      127G   65G   57G  54% /

none            4.1k     0  4.1k   0% /sys/fs/cgroup

none            5.3M     0  5.3M   0% /run/lock

none             32G     0   32G   0% /run/shm

none            105M     0  105M   0% /run/user

overflow        1.1M   37k  1.1M   4% /tmp

Daniel Farina

unread,
Apr 29, 2016, 7:50:02 PM4/29/16
to cisc...@gmail.com, wal-e
On Fri, Apr 29, 2016 at 4:16 PM Ben Hitz <cisc...@gmail.com> wrote:

So, I let my disk fill up (shame!) on my server.  I stopped and increased EBS volume size, so plenty of space now.

However, my wal-e pushes are failing; I have about 8 files left in my pg_xlog:
Critical message:

lzop: No space left on device: <stdout>


It looks like you need more temp space. Your "overflow' file system is tiny (1.1M) and can't handle WAL-E's temp files.

If you need a quick fix, consider setting `TMPDIR` to be anywhere expansive and accessible to WAL-E.

Ben Hitz

unread,
Apr 29, 2016, 8:17:53 PM4/29/16
to wal-e
You are a hero, Sir.   My guess is AWS or ubuntu created that /tmp as overflow when the disk filled up.  

I just did
TMPDIR='/var/tmp'; export TMPDIR
(which is on /) and wal-e pushed.
Reply all
Reply to author
Forward
0 new messages