S3QL crashing when backing up to cloud using WiFi (Google) but not when backing up locally or through cable

128 views
Skip to first unread message

Jules F.

unread,
Dec 18, 2020, 2:10:04 PM12/18/20
to s3ql
Hi,

I'm using S3QL 3.6.0 to back up to Google Cloud through WiFi from my laptop.
I'm using rsync to transfer the files to the mounted endpoint.
After a while, especially on large files, rsync will error out because S3QL will crash:
rsync: close failed on "/some/path/to/.file.iso.tyw4xE": Transport endpoint is not connected (107)

This does not happen when I back up to an S3QL mounted on a local drive or when backing up through a network cable attached to my router.

So I thought maybe my WiFi connection is too unstable or slow so I added a parameter like --bwlimit=4000 to rsync to cap the transfer speed below that of my connection, but it didn't help, it still crashed.

Do you have any suggestions for what I can do to prevent this from happening when backing up through WiFi using rsync? Or maybe there are fixes that can be implemented in S3QL to ensure it works properly for slower and/or unstable connections?


Thanks

Daniel Jagszent

unread,
Dec 18, 2020, 7:45:44 PM12/18/20
to s3ql
Hi,

your ~/.s3ql/mount.log should contain the exception information why S3QL
crashed.
If it is because of connection or request timeouts you might want to
increase the TCP timout time for S3QL:
https://www.rath.org/s3ql-docs/backends.html#cmdoption-gs-backend-arg-tcp-timeout
(e.g. add --backend-options=tcp-timeout=60 to the mount.s3ql call)

Jules F.

unread,
Dec 19, 2020, 8:06:15 AM12/19/20
to s3ql
There is no error in mount.log which makes it all the more scary. I usually don't want to continue using a file system after such an error because I'm thinking it might be corrupted in unknown and unfixable ways (even by fsck).

Daniel Jagszent

unread,
Dec 20, 2020, 6:35:25 PM12/20/20
to s3ql

> There is no error in mount.log [...]
That's quite unusual. S3QL will almost always log exceptions. If the
only difference is that you use WIFI instead of a cable when S3QL
crashes I do not think it's that special that S3QL would not log an
exception.
How do you start mount.s3ql? Maybe you disabled mount.log logging
altogether?
> [...]I usually don't want to continue using a file system after such
> an error because I'm thinking it might be corrupted in unknown and
> unfixable ways (even by fsck).
It's quite safe that you use a S3QL filesystem after a crash. I have
some S3QL filesystems running since 2013. They have crashed several
times thru the years. They always could be repaired with fsck. But
always run fsck on the machine that crashed. (since this machine has the
up-to-date SQlite database that you need)

Jules F.

unread,
Dec 23, 2020, 2:48:17 PM12/23/20
to s3ql
Actually it now happens when using the network cable to connect to the internet as well. I'm thinking it might be from one of the latest versions then. I still get no errors in mount.log between the mount time and crashing time. It works fine when backing up to a USB drive.

Daniel Jagszent

unread,
Dec 23, 2020, 3:50:31 PM12/23/20
to s3ql
Maybe a run with the --debug argument will help identifying the problem:
https://www.rath.org/s3ql-docs/man/mount.html

What version of libfuse3 is your OS using? What are the outputs of these
commands:
find /lib* /usr/lib* -name 'libfuse*.so.*'
fusermount3 --version

How did you install S3QL?

Jules F.

unread,
Dec 23, 2020, 5:30:42 PM12/23/20
to s3ql
Hi,

Output of commands:

user ~ $ find /lib* /usr/lib* -name 'libfuse*.so.*'
/lib/x86_64-linux-gnu/libfuse.so.2.9.9
/lib/x86_64-linux-gnu/libfuse.so.2
/lib/x86_64-linux-gnu/libfuse3.so.3
/lib/x86_64-linux-gnu/libfuse3.so.3.9.0

user ~ $ fusermount3 --version
fusermount3 version: 3.9.0

I installed it with: sudo python3 setup.py install (and then set permissions with sudo chmod -R 755 /usr/local/lib/python* /var/lib/python* /usr/local/bin/ for a few dirs).

Now I'm in the process of another backup attempt, and it didn't fail at that 4.8 GB iso file which seemed to cause problems, but I expect it could fail for other similarly large files as I haven't changed anything besides the --debug parameter.

The full mount command looks like:

mount.s3ql --debug --cachedir /mnt/s3ql_ramdisk --allow-root gs://some_bucket_name/main /home/user/the/mount/point/

Thanks

Jules F.

unread,
Dec 23, 2020, 6:21:14 PM12/23/20
to s3ql
It crashed again. Now the last few lines of mount.log (with the --debug option) are:

2020-12-24 01:01:39.919 747066:Thread-1 s3ql.backends.gs._do_request: started with POST /upload/storage/v1/b/some_bucket_name/o, qs={'uploadType': 'multipart'}
2020-12-24 01:01:39.996 747066:MainThread s3ql.fs.getxattr: started with 149587, b'security.capability'
2020-12-24 01:01:40.099 747066:MainThread s3ql.fs.getxattr: started with 149587, b'security.capability'
2020-12-24 01:01:40.103 747066:MainThread s3ql.fs.getxattr: started with 149587, b'security.capability'
2020-12-24 01:01:40.211 747066:MainThread s3ql.fs.getxattr: started with 149587, b'security.capability'
2020-12-24 01:01:40.216 747066:MainThread s3ql.fs.getxattr: started with 149587, b'security.capability'
2020-12-24 01:01:40.302 747066:Thread-1 s3ql.backends.gs._parse_error_response: Server response not JSON - intermediate proxy failure?
2020-12-24 01:01:40.303 747066:MainThread s3ql.block_cache.with_event_loop: upload of 86350 failed
NoneType: None
2020-12-24 01:01:40.320 747066:Thread-1 s3ql.mount.exchook: recording exception <RequestError, code=401, reason='Unauthorized', with body data>

Any thoughts?

Thanks

Daniel Jagszent

unread,
Dec 24, 2020, 10:02:57 AM12/24/20
to s3ql
Hi Jules,
[...]
2020-12-24 01:01:40.302 747066:Thread-1 s3ql.backends.gs._parse_error_response: Server response not JSON - intermediate proxy failure?
2020-12-24 01:01:40.303 747066:MainThread s3ql.block_cache.with_event_loop: upload of 86350 failed
NoneType: None
2020-12-24 01:01:40.320 747066:Thread-1 s3ql.mount.exchook: recording exception <RequestError, code=401, reason='Unauthorized', with body data>
[...]
I do not know why you do not get any exception backtrace information but S3QL did raise an exception (and the last resort exception hook logs that). It might be that your trio version is too old or new for S3QL. (I am using 0.15.0 and this version works good with S3QL)

That might be the same issue as https://github.com/s3ql/s3ql/issues/224

It might be that the Google Storage Backend with OAuth2 authentication is currently not working as expected anymore. Does the crash happens to be approx. 1 hour after you mounted the file system? That's the expiration time of an access token of the Google Backend as far as I know.

Could you try to use ADC instead of OAuth for authentication? https://www.rath.org/s3ql-docs/backends.html#google-storage

You could also try to change this line:
https://github.com/s3ql/s3ql/blob/release-3.6.0/src/s3ql/backends/gs.py#L443
            if exc.message == 'Invalid Credentials':
to
            if exc.message == 'Invalid Credentials' or exc.code == 401:

This change is a little bit to broad as it handles any HTTP 401 status code as an access token expiration – but since at this point you needed to have done several successful requests already that might be OK.

I do not use Google Storage as backend so sorry for these rather unspecific suggestions.

Jules F.

unread,
Dec 24, 2020, 10:43:37 AM12/24/20
to s3ql
I was already using ADC. I was thinking Google Storage might be intermittently throwing errors that are not entirely handled in S3QL, causing my issues. I've now switched to S3 storage. If it works properly I think I'll stick to S3 considering that pricing is almost identical and that I might have fewer issues with this (as it's more popular and its name is actually contained within S3QL's name).

Jules F.

unread,
Dec 25, 2020, 12:03:52 PM12/25/20
to s3ql
Backed up to S3, everything went without issues, will stick to S3.

Nikolaus Rath

unread,
Dec 27, 2020, 6:14:43 AM12/27/20
to s3...@googlegroups.com
This is *very* unlikely. Can you try running mount.s3ql in foreground on
the console (--fg) and watch how it terminates?

The only way for it to terminate without writing details into mount.log
is for it to segfault (which should also be visible in your kernel
logs). And even in this case, you should see details in
~/.s3ql/mount.s3ql_crit.log.

(I am assuming you have ruled out permissions and disk full issues for
the log directory).

I would be very hesitant to use S3QL until you have figured this out -
no matter which backend or network connection you use. Something is
fundamentally wrong with your installation.


Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Ivan Shapovalov

unread,
Dec 27, 2020, 6:17:08 AM12/27/20
to Nikolaus Rath, s3...@googlegroups.com
Just a quick note, I've been experiencing an exactly identical problem
since ~a month ago, B2 backend, no compression or encryption, stable
Internet connection. Logs are inconclusive, no backtraces, nothing. The
problem is reproducible 100% of the times. I'll send the full logs and
other info shortly.

--
Ivan Shapovalov / intelfx /
signature.asc

Jules F.

unread,
Dec 29, 2020, 1:43:22 PM12/29/20
to s3ql
I do not know why you do not get any exception backtrace information but S3QL did raise an exception (and the last resort exception hook logs that). It might be that your trio version is too old or new for S3QL. (I am using 0.15.0 and this version works good with S3QL)

Installation says it's using trio 0.17.0 (and verified this is what it's actually using). 
Reply all
Reply to author
Forward
0 new messages