URGENT: cannot complete wal-e push after disk issues

126 views

Skip to first unread message

Ben Hitz

unread,

Apr 29, 2016, 7:16:32 PM4/29/16

to wal-e

So, I let my disk fill up (shame!) on my server. I stopped and increased EBS volume size, so plenty of space now.

However, my wal-e pushes are failing; I have about 8 files left in my pg_xlog:

Critical message:

lzop: No space left on device: <stdout>

Here is what I get if I run manually (similar messages in logs every minute or so):

I guess my plan would be manually lzop the files and s3 copy them? Then shut this bad boy down and switch to new server?

postgres@ip-172-31-33-208:~/9.3/main$ /opt/wal-e/bin/envfile --config ~postgres/.aws/credentials --section default --upper -- /opt/wal-e/bin/wal-e --s3-prefix="$(cat /etc/postgresql/9.3/main/wale_s3_prefix)" wal-push "pg_xlog/00000014000006F6000000D6"

wal_e.main INFO MSG: starting WAL-E

DETAIL: The subcommand is "wal-push".

STRUCTURED: time=2016-04-29T22:54:44.566111-00 pid=5623

wal_e.worker.upload INFO MSG: begin archiving a file

DETAIL: Uploading "pg_xlog/00000014000006F6000000D6" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D6.lzo".

STRUCTURED: time=2016-04-29T22:54:44.578467-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D6.lzo prefix=production/ seg=00000014000006F6000000D6 state=begin

wal_e.worker.upload INFO MSG: begin archiving a file

DETAIL: Uploading "pg_xlog/00000014000006F6000000D7" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D7.lzo".

STRUCTURED: time=2016-04-29T22:54:44.580494-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D7.lzo prefix=production/ seg=00000014000006F6000000D7 state=begin

wal_e.worker.upload INFO MSG: begin archiving a file

DETAIL: Uploading "pg_xlog/00000014000006F6000000D8" to "s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D8.lzo".

STRUCTURED: time=2016-04-29T22:54:44.583706-00 pid=5623 action=push-wal key=s3://encoded-backups-prod/production/wal_005/00000014000006F6000000D8.lzo prefix=production/ seg=00000014000006F6000000D8 state=begin

lzop: No space left on device: <stdout>

Traceback (most recent call last):

File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

result = self._run(*self.args, **self.kwargs)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

self.gpg_key_id)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

pass

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

command.finish()

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

.format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.681728-00 pid=5623

<Greenlet at 0x7ff6a7b92730: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical

Traceback (most recent call last):

File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

result = self._run(*self.args, **self.kwargs)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

self.gpg_key_id)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

pass

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

command.finish()

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

.format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.684522-00 pid=5623

<Greenlet at 0x7ff6a7b92cd0: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical

Traceback (most recent call last):

File "/opt/wal-e/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run

result = self._run(*self.args, **self.kwargs)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/upload.py", line 50, in __call__

self.gpg_key_id)

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/worker/worker_util.py", line 34, in do_lzop_put

pass

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 115, in __exit__

command.finish()

File "/opt/wal-e/local/lib/python2.7/site-packages/wal_e/pipeline.py", line 204, in finish

.format(" ".join(self._command), retcode))

UserCritical: CRITICAL: MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.685912-00 pid=5623

<Greenlet at 0x7ff6a7b92c30: <wal_e.worker.upload.WalUploader object at 0x7ff6a7b754d0>(<wal_e.worker.pg.wal_transfer.WalSegment object at)> failed with UserCritical

wal_e.main CRITICAL MSG: pipeline process did not exit gracefully

DETAIL: "lzop -c" had terminated with the exit status 1.

STRUCTURED: time=2016-04-29T22:54:44.686046-00 pid=5623

Here is my disk usage.

postgres@ip-172-31-33-208:~/9.3/main$ df -H

Filesystem Size Used Avail Use% Mounted on

udev 32G 13k 32G 1% /dev

tmpfs 6.4G 377k 6.4G 1% /run

/dev/xvda1 127G 65G 57G 54% /

none 4.1k 0 4.1k 0% /sys/fs/cgroup

none 5.3M 0 5.3M 0% /run/lock

none 32G 0 32G 0% /run/shm

none 105M 0 105M 0% /run/user

overflow 1.1M 37k 1.1M 4% /tmp

Daniel Farina

unread,

Apr 29, 2016, 7:50:02 PM4/29/16

to cisc...@gmail.com, wal-e

On Fri, Apr 29, 2016 at 4:16 PM Ben Hitz <cisc...@gmail.com> wrote:

So, I let my disk fill up (shame!) on my server. I stopped and increased EBS volume size, so plenty of space now.

However, my wal-e pushes are failing; I have about 8 files left in my pg_xlog:
Critical message:
lzop: No space left on device: <stdout>

It looks like you need more temp space. Your "overflow' file system is tiny (1.1M) and can't handle WAL-E's temp files.

If you need a quick fix, consider setting `TMPDIR` to be anywhere expansive and accessible to WAL-E.

Ben Hitz

unread,

Apr 29, 2016, 8:17:53 PM4/29/16

to wal-e

You are a hero, Sir. My guess is AWS or ubuntu created that /tmp as overflow when the disk filled up.