Possible greenlet issue during backup-fetch

97 views
Skip to first unread message

Dave Kichler

unread,
Mar 11, 2015, 10:57:10 PM3/11/15
to wa...@googlegroups.com
I've recently (noticed about 8 days ago) started experiencing an issue while performing backup-fetches.  It looks something like this (wrapper only contains a few swift auth & container params):

$ envs/wal-e/bin/wal-e-wrapper-restore.sh --terse backup-fetch /var/lib/pgsql/9.2/data_next LATEST
keystoneclient.httpclient WARNING  Failed to retrieve management_url from token
keystoneclient.httpclient WARNING  Failed to retrieve management_url from token
keystoneclient.httpclient WARNING  Failed to retrieve management_url from token
lzop: Invalid argument: <stdin>
lzop: <stdin>: Compressed data violation
wal_e.retries WARNING  MSG: retrying after encountering exception
        DETAIL: Exception information dump:
        Traceback (most recent call last):
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/retries.py", line 62, in shim
            return f(*args, **kwargs)
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/worker/swift/swift_worker.py", line 73, in fetch_partition
            TarPartition.tarfile_extract(pl.stdout, self.local_root)
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/tar_partition.py", line 261, in tarfile_extract
            bufsize=pipebuf.PIPE_BUF_BYTES)
          File "/usr/lib64/python2.7/tarfile.py", line 1690, in open
            **kwargs)
          File "/usr/lib64/python2.7/tarfile.py", line 1574, in __init__
            self.firstmember = self.next()
          File "/usr/lib64/python2.7/tarfile.py", line 2338, in next
            raise ReadError("empty file")
        ReadError: empty file

        HINT: A better error message should be written to handle this exception.  Please report this output and, if possible, the situation under which it arises.
        STRUCTURED: time=2015-03-12T02:25:52.919447-00 pid=24675
wal_e.retries WARNING  MSG: retrying after encountering exception
        DETAIL: Exception information dump:
        Traceback (most recent call last):
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/retries.py", line 62, in shim
            return f(*args, **kwargs)
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/worker/swift/swift_worker.py", line 78, in fetch_partition
            raise exc
        AssertionError: This socket is already used by another greenlet: <bound method Waiter.switch of <gevent.hub.Waiter object at 0x7f1e581b76e0>>

        HINT: A better error message should be written to handle this exception.  Please report this output and, if possible, the situation under which it arises.
        STRUCTURED: time=2015-03-12T02:25:53.429872-00 pid=24675
wal_e.retries WARNING  MSG: retrying after encountering exception
        DETAIL: Exception information dump:
        Traceback (most recent call last):
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/retries.py", line 62, in shim
            return f(*args, **kwargs)
          File "/var/lib/pgsql/envs/wal-e/lib/python2.7/site-packages/wal_e/worker/swift/swift_worker.py", line 78, in fetch_partition
            raise exc
        OSError: [Errno 32] Broken pipe

The AssertionError would seem to indicate some sort of issue with greenlet synchronization.

The context of this issue is on an RHEL 6.6 machine with the following dependencies contained within a Python27 based virtualenv:
$ pip list
argparse (1.2.1)
azure (0.8.4)
Babel (1.3)
boto (2.32.1)
futures (2.2.0)
gevent (1.0.1)
greenlet (0.4.4)
iso8601 (0.1.10)
lockfile (0.10.2)
netaddr (0.7.12)
oslo.config (1.4.0)
pbr (0.10.0)
pip (1.5.6)
prettytable (0.7.2)
python-daemon (1.6.1)
python-keystoneclient (0.11.1)
python-swiftclient (2.3.1.60.gc9f79e6)
pytz (2014.7)
requests (2.4.3)
setuptools (3.6)
simplejson (3.6.4)
six (1.8.0)
stevedore (1.0.0)
wal-e (0.8c2)
wsgiref (0.1.2)
(wal-e)

I did attempt to downgrade wal-e wondering if this might be related to recent releases but the same behaviour is present in 0.8a1 and 0.7.0.  

I haven't had a chance to go any deeper on this particular issue but figured I'd post it here in case anyone else has experienced it as well or the issue is more obvious to other group members.

Thanks,
Dave 

Jeff Frost

unread,
Mar 11, 2015, 11:14:35 PM3/11/15
to dkic...@gmail.com, wa...@googlegroups.com
On Mar 11, 2015, at 7:57 PM, Dave Kichler <dkic...@gmail.com> wrote:

python-daemon (1.6.1)
python-keystoneclient (0.11.1)
python-swiftclient (2.3.1.60.gc9f79e6)
pytz (2014.7)
requests (2.4.3)
setuptools (3.6)
simplejson (3.6.4)
six (1.8.0)
stevedore (1.0.0)
wal-e (0.8c2)
wsgiref (0.1.2)
(wal-e)

I did attempt to downgrade wal-e wondering if this might be related to recent releases but the same behaviour is present in 0.8a1 and 0.7.0.  

I haven't had a chance to go any deeper on this particular issue but figured I'd post it here in case anyone else has experienced it as well or the issue is more obvious to other group members.

First thing I would do is try to fetch a file with the swiftclient cli tool just to make sure rackspace(?) didn’t change the required version of swiftclient to a newer one and also validate your api keys just in case there’s a typo.

Dave Kichler

unread,
Mar 12, 2015, 12:20:50 PM3/12/15
to wa...@googlegroups.com, dkic...@gmail.com
Thanks for the suggestion Jeff, but it would appear that swift is the newest version available and is still working as expected (again, wrapper only includes auth params):

$ envs/wal-e/bin/swift-wrapper.sh --os-region-name ORD download stg-wale /wal_005/000000050000021D0000003F.lzo
wal_005/000000050000021D0000003F.lzo [auth 0.390s, headers 0.472s, total 0.906s, 9.754 MB/s]

Dave Kichler

unread,
Mar 13, 2015, 5:28:57 PM3/13/15
to wa...@googlegroups.com, dkic...@gmail.com
While the swiftclient CLI was still able to successfully download a file as demonstrated in my last reply, sure enough it was the source of the error I reported.  The swiftclient was being installed by pointing pip to the github repo which installed version 2.3.1.60.gc9f79e6 as I initially reported.  Uninstalling this version and reinstalling instead from standard repos installed version 2.3.1, which seems to have alleviated the problem.
Reply all
Reply to author
Forward
0 new messages