OSError: [Errno 14] Bad address

393 views
Skip to first unread message

Jeff Bogatay

unread,
Feb 4, 2015, 5:00:17 PM2/4/15
to s3...@googlegroups.com
I am in the process of crafting a s3ql backed backup solution.   During the development/testing I left the store mounted, installed some system updates and rebooted.

Now I am unable to mount and/or check the store.   Running 2.12 on ArchLinux.  It has been several hours since I last wrote to the store.

My last attempt was to delete the local metadata and have it rebuilt.   Same error as below.

Not sure what to do next or how to recover.   Are these stores typically this fragile?

Also, as a test I created a fresh mount, wrote to it, unmounted it, and remounted it without any issues.



2015-02-04 16:56:35.635 9617:MainThread s3ql.deltadump.dump_metadata: dump_table(ext_attributes): writing 0 rows
2015-02-04 16:56:35.635 9617:MainThread s3ql.fsck.main: Compressing and uploading metadata...
2015-02-04 16:56:35.635 9617:MainThread s3ql.backends.s3c.open_write: open_write(s3ql_metadata_new): start
2015-02-04 16:56:50.247 9617:MainThread s3ql.backends.s3c.close: ObjectW(s3ql_metadata_new).close(): start
2015-02-04 16:56:50.247 9617:MainThread s3ql.backends.s3c._do_request: preparing PUT /s3ql_metadata_new?None, qs=None
2015-02-04 16:56:50.247 9617:MainThread s3ql.backends.s3c._send_request: _send_request(): PUT /s3ql_metadata_new
2015-02-04 16:56:50.248 9617:MainThread s3ql.backends.common.wrapped: Encountered ConnectionClosed exception (connection closed unexpectedly), retrying call to ObjectW.close for the 1-th time...
2015-02-04 16:56:50.268 9617:MainThread s3ql.backends.s3c.close: ObjectW(s3ql_metadata_new).close(): start
2015-02-04 16:56:50.268 9617:MainThread s3ql.backends.s3c._do_request: preparing PUT /s3ql_metadata_new?None, qs=None
2015-02-04 16:56:50.268 9617:MainThread s3ql.backends.s3c._send_request: _send_request(): PUT /s3ql_metadata_new
2015-02-04 16:56:50.727 9617:MainThread root.excepthook: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/fsck.s3ql", line 9, in <module>
    load_entry_point('s3ql==2.12', 'console_scripts', 'fsck.s3ql')()
  File "/usr/lib/python3.4/site-packages/s3ql/fsck.py", line 1287, in main
    is_compressed=True)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/common.py", line 46, in wrapped
    return method(*a, **kw)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/common.py", line 258, in perform_write
    return fn(fh)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/comprenc.py", line 642, in __exit__
    self.close()
  File "/usr/lib/python3.4/site-packages/s3ql/backends/comprenc.py", line 636, in close
    self.fh.close()
  File "/usr/lib/python3.4/site-packages/s3ql/backends/common.py", line 46, in wrapped
    return method(*a, **kw)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/s3c.py", line 844, in close
    headers=self.headers, body=self.fh)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/s3c.py", line 407, in _do_request
    query_string=query_string, body=body)
  File "/usr/lib/python3.4/site-packages/s3ql/backends/s3c.py", line 649, in _send_request
    copyfileobj(body, self.conn, BUFSIZE)
  File "/usr/lib/python3.4/shutil.py", line 69, in copyfileobj
    fdst.write(buf)
  File "/usr/lib/python3.4/site-packages/dugong/__init__.py", line 653, in write
    eval_coroutine(self.co_write(buf), self.timeout)
  File "/usr/lib/python3.4/site-packages/dugong/__init__.py", line 1396, in eval_coroutine
    if not next(crt).poll(timeout=timeout):
  File "/usr/lib/python3.4/site-packages/dugong/__init__.py", line 679, in co_write
    yield from self._co_send(buf)
  File "/usr/lib/python3.4/site-packages/dugong/__init__.py", line 619, in _co_send
    len_ = self._sock.send(buf)
  File "/usr/lib/python3.4/ssl.py", line 679, in send
    v = self._sslobj.write(data)
OSError: [Errno 14] Bad address




Nikolaus Rath

unread,
Feb 4, 2015, 6:58:31 PM2/4/15
to s3...@googlegroups.com
Jeff Bogatay <je...@bogatay.com> writes:
> I am in the process of crafting a s3ql backed backup solution. During
> the development/testing I left the store mounted, installed some
> system updates and rebooted.
>
> Now I am unable to mount and/or check the store. Running 2.12 on
> ArchLinux. It has been several hours since I last wrote to the store.
>
> My last attempt was to delete the local metadata and have it rebuilt.
> Same error as below.
>
> Not sure what to do next or how to recover. Are these stores typically
> this fragile?

What do you mean with "the store"? Are you talking about a remote
storage server? In that case the fragility obviously depends on the
server.

> Also, as a test I created a fresh mount, wrote to it, unmounted it,
> and remounted it without any issues.
>
> 2015-02-04 16:56:35.635 9617:MainThread s3ql.deltadump.dump_metadata:
> dump_table(ext_attributes): writing 0 rows
[...]

I am not sure what I'm looking at here. First you say it works, but then
you quote an error message (and the formatting is pretty messed
up). Can you be more precise as to when exactly the error occurs (and is
it always the same)?


Best,
-Nikolaus

--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Jeff Bogatay

unread,
Feb 5, 2015, 9:37:47 AM2/5/15
to s3...@googlegroups.com
Sorry,

"The Store" = the backend/remote storage.   In this case it's Amazon S3/USA.

With respect to fragility, I was simply referring to forgetting to umount the remote filesystem (or power failure, or... whatever).  I see links to people who leave remote filesystems mounted (via upstart/systemd) and if a reboot without umount can corrupt the filesystem to the point it's unmountable, and it's unfsckable -- that's an issue.

The log in the original post is a clip of the fsck.log file contained in the .s3ql directory after running fsck with the --debug option.

The part that worked was a fresh, new filesystem.   I wanted to ensure that my install after upgrading worked.   Upgrading ArchLinux can sometimes be.... picky.

The main backup filesystem is still unfsckable.  Here is the console output instead of the log.


Using cached metadata.
Remote metadata is outdated.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 553000 objects so far..
Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...

Nikolaus Rath

unread,
Feb 5, 2015, 11:55:42 AM2/5/15
to s3...@googlegroups.com
Hi Jeff,

When replying to emails on this list, please do not put your reply
above the quoted text, and do not quote the entire message you're
answering to. This makes it unnecessarily hard for other readers to
understand the context of your email. Instead, please cut quoted parts
that are not relevant to your reply, and insert your responses right
after the points you're replying to (as I have done below). Thanks!

Jeff Bogatay <je...@bogatay.com> writes:
> Sorry,
>
> "The Store" = the backend/remote storage. In this case it's Amazon
> S3/USA.
>
> With respect to fragility, I was simply referring to forgetting to
> umount the remote filesystem (or power failure, or... whatever). I see
> links to people who leave remote filesystems mounted (via
> upstart/systemd) and if a reboot without umount can corrupt the
> filesystem to the point it's unmountable, and it's unfsckable --
> that's an issue.

No, this should not happen. If you reboot without unmounting, you will
need to run fsck.s3ql, but after things should just work you. I'm
actually testing this regularly myself. So something is special about
your system.

> The main backup filesystem is still unfsckable. Here is the console
> output instead of the log.
>
> Using cached metadata.
> Remote metadata is outdated.
> Checking DB integrity...
[...]
> Compressing and uploading metadata...
> Uncaught top-level exception:
[..]
> File "/usr/lib/python3.4/site-packages/dugong/__init__.py", line 619,
> in _co_send
> len_ = self._sock.send(buf)
> File "/usr/lib/python3.4/ssl.py", line 679, in send
> v = self._sslobj.write(data)
> OSError: [Errno 14] Bad address

(the formatting of the output is still messed up, in the future please
attach it as a separate file if your mailclient insists on reformatting
it)

What this error means is that your file system is in good condition, but
when fsck.s3ql tries to upload a "clean" marker to the storage server,
the operating system signals an error with the TCP/IP connection.

Are you able to try fsck'ing this file system from another computer?

If you are comfortable doing that, could you check if you get the same
error if you run fsck.s3ql with --backend-options no-ssl?

Jeff Bogatay

unread,
Feb 5, 2015, 12:11:40 PM2/5/15
to s3...@googlegroups.com
(the formatting of the output is still messed up, in the future please 
attach it as a separate file if your mailclient insists on reformatting
it)

I'm just using the google groups interface, and the log (as I see it) is exactly as it shows on the screen.  Odd.
 

If you are comfortable doing that, could you check if you get the same 
error if you run fsck.s3ql with --backend-options no-ssl?

no-ssl worked.  The fsck completed and I was able to mount it.   Is this a quirk of AmazonS3 or something?    The regular mounted operations work fine.

Thanks a bunch.   S3QL's features are awesome.  :)


Nikolaus Rath

unread,
Feb 5, 2015, 1:43:03 PM2/5/15
to s3...@googlegroups.com
Jeff Bogatay <je...@bogatay.com> writes:
>> If you are comfortable doing that, could you check if you get the
>> same error if you run fsck.s3ql with --backend-options no-ssl?
>
> no-ssl worked. The fsck completed and I was able to mount it. Is this
> a quirk of AmazonS3 or something? The regular mounted operations work
> fine.

Not really, it works fine for many people. If you have time to debug
this further, could you maybe try to get a traffic dump with Wireshark
when the error occurs?

Jeff Bogatay

unread,
Feb 5, 2015, 2:45:05 PM2/5/15
to s3...@googlegroups.com
Not really, it works fine for many people. If you have time to debug
this further, could you maybe try to get a traffic dump with Wireshark
when the error occurs?

Sure.

Jeff Bogatay

unread,
Feb 6, 2015, 3:36:40 PM2/6/15
to s3...@googlegroups.com
At some point last night, the connection dropped.  (Not sure why).    The drive was still mounted, but dead.

Umounted the drive, tried to remount, was corrupt requiring a fsck.

Same error on fsck (OSError: [Errno 14] Bad address).

I ran wireshark and it gets an  Encrypted Alert 21  just before it dies.

I have a wireshark capture of the end activity (after the "Checking Objects" part).  If it would be helpful I will gladly sent it to you offline.

Nikolaus Rath

unread,
Feb 10, 2015, 6:37:16 PM2/10/15
to s3...@googlegroups.com
Thanks for the traffic dump. The remote server sends you a TLS alert
code with code 21 and (probably in response to that) S3QL (or more
precisely OpenSSL) resets the TCP connection. Errorcode 21 means
"Decryption failed", i.e. the server was unable to decrypt a message
from the client.

I'm not well versed in TLS, but I believe resetting the connection is
the proper thing to do in this situation. However, it seems that OpenSSL
isn't coping well with this, thus the "Bad Address" error that it
signals to S3QL.

Which OpenSSL version are you using? Could you be affected by
https://github.com/excon/excon/issues/467?

Jeff Bogatay

unread,
Feb 11, 2015, 9:42:46 AM2/11/15
to s3...@googlegroups.com
I was running 1.0.2.

I still had 1.0.1 in my archlinux pkg cache, so I downgraded and everything worked fine.   Fsck completed without any issues.

For anybody else out there experiencing this issue -- it's clearly an openssl problem.


Thanks!

Nikolaus Rath

unread,
Feb 11, 2015, 11:27:38 AM2/11/15
to s3...@googlegroups.com
It'd be great if you could file an OpenSSL bug then. You probably don't
want to stay with version 1.0.1 forever :-).

Jeff Bogatay

unread,
Mar 24, 2015, 5:03:28 PM3/24/15
to s3...@googlegroups.com
Just noticed 1.0.2a was released and is working properly.



--
You received this message because you are subscribed to a topic in the Google Groups "s3ql" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/s3ql/1LWwIeZYzXI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to s3ql+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages