Any way to stabilize S3QL in my convoluted Amazon Cloud Drive setup?

412 views
Skip to first unread message

Mike Beaubien

unread,
Oct 15, 2016, 10:29:25 AM10/15/16
to s3ql
Hi,

I'm doing the usual workaround to get s3ql on top of acd using unionfs to merge a local s3ql file system in RW mode with some additional data files stored in ACD and mounted through acd_cli in RO.

It works well when it works, but occasionally s3ql will crash with an error like:

2016-10-15 02:39:42.163 7460:MainThread s3ql.mount.unmount: Unmounting file system...
2016-10-15 02:39:42.237 7460:MainThread root.excepthook: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 740, in _get_entry
    el = self.cache[(inode, blockno)]
KeyError: (257, 45)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/mount.s3ql", line 9, in <module>
    load_entry_point('s3ql==2.15', 'console_scripts', 'mount.s3ql')()
  File "/usr/lib/s3ql/s3ql/mount.py", line 214, in main
    llfuse.main(options.single)
  File "src/llfuse/fuse_api.pxi", line 319, in llfuse.capi.main (src/llfuse/capi_linux.c:26545)
  File "src/llfuse/handlers.pxi", line 328, in llfuse.capi.fuse_read (src/llfuse/capi_linux.c:9931)
  File "src/llfuse/handlers.pxi", line 329, in llfuse.capi.fuse_read (src/llfuse/capi_linux.c:9881)
  File "/usr/lib/s3ql/s3ql/fs.py", line 1039, in read
    tmp = self._readwrite(fh, offset, length=length)
  File "/usr/lib/s3ql/s3ql/fs.py", line 1116, in _readwrite
    with self.cache.get(id_, blockno) as fh:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 718, in get
    el = self._get_entry(inode, blockno)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 775, in _get_entry
    backend.perform_read(do_read, 's3ql_data_%d' % obj_id)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 107, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 314, in perform_read
    fh = self.open_read(key)
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 156, in open_read
    fh = self.backend.open_read(key)
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 83, in open_read
    fh.metadata = _read_meta(fh)
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 245, in _read_meta
    buf = fh.read(9)
OSError: [Errno 70] Communication error on send

It's probably just some temporary error for whatever network reason. Is there anyway to get s3ql to ignore and retry on these errors?

Thanks

Nikolaus Rath

unread,
Oct 15, 2016, 5:16:42 PM10/15/16
to s3...@googlegroups.com
On Oct 15 2016, Mike Beaubien <mike.b...@gmail.com> wrote:
> Hi,
>
> I'm doing the usual workaround to get s3ql on top of acd using unionfs to
> merge a local s3ql file system in RW mode with some additional data files
> stored in ACD and mounted through acd_cli in RO.

I very much hope thata this is not truly a "usual" configuration. Why
don't you use the S3 backend?

>
> It works well when it works, but occasionally s3ql will crash with an error
> like:
>
[..]
> buf = fh.read(9)
> OSError: [Errno 70] Communication error on send
>
> It's probably just some temporary error for whatever network reason. Is
> there anyway to get s3ql to ignore and retry on these errors?

Not in practice, no. If S3QL attempts to read data from a file, and the
operating system returns an error, then this typically means that
something is seriosly wrong and needs immediate attention. Crashing is
the best course of action to make sure the problem is noticed. There is
no way for S3QL to determine that in this case the problem is caused by
an apparently buggy acd_cli. And even if there was a way, it would not
be feasible for S3QL to anticipate and handle them for all the system
calls that could possibly fail.

Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Jonas Lippuner

unread,
Oct 26, 2016, 2:57:09 PM10/26/16
to s3ql
I'm using a similar setup. Mount ACD locally with acd_cli and then use an s3ql volume inside the mounted ACD path. The reason for using ACD rather than S3 is very simple: cost.

Currently, I'm using 166 GB on ACD and I expect this to grow by a factor of 2 or 3 within the next weeks. One gets unlimited storage on ACD for $60 per year, i.e. $5 per month. $5/mo buys about 167 GB of standard S3 storage (infrequent access or glacier would be cheaper, but not more than about 700 GB) and that's not counting any request cost. So ACD is a lot cheaper once one has more than a few hundred GB.

Here's what I've found with acd_cli + s3ql.

1) Stability is markedly improved by running acd_cli in single-thread mode. Without this, I had times when even the metadata backup rotation would fail every time. Unfortunately, this cuts my bandwidth by a factor of 2 or 3.

2) S3QL is actually very good with data integrity, even when it crashes. Multiple times, I copied 10's of GBs to my S3QL volume (it all goes into the cache) and then the S3QL file system crashes after uploading a few GB (see one backtrace below). However, the mount.s3ql process will keep going, uploading all the data until all the data is uploaded (I can monitor network traffic and the ACD web interface). Once all the data is uploaded, I can kill mount.s3ql and run fsck.s3ql, which cleans everything up. Sometimes a few data blocks got corrupted and so some files get moved to lost+found. I can then copy those files again into the mounted S3QL volume, which is very fast because thanks to deduplication only the missing data blocks need to be uploaded. Then I can delete the file in lost+found.

3) If I run acd_cli in multi-threaded mode (default) then S3QL crashes very quickly, but it still keeps uploading all the data as described above, however this upload after the crash seems to be done in a single-threaded manner, which again reduces the bandwidth. So I've decided to run acd_cli in single-thread mode since the upload will the single-threaded after the crash anyway and this way I can at least delay the crash for a substantial amount of time and metadata backups will only work in single-threaded mode.

Here's the backtrace (from mount.log) of the error I usually see:

2016-10-24 17:58:46.920 7693:Thread-5 root.excepthook: Uncaught top-level exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/mount.py", line 64, in run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/block_cache.py", line 405, in _upload_loop
    self._do_upload(*tmp)
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/block_cache.py", line 432, in _do_upload
    % obj_id).get_obj_size()
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/common.py", line 108, in wrapped
    return method(*a, **kw)
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/common.py", line 339, in perform_write
    with self.open_write(key, metadata, is_compressed) as fh:
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/comprenc.py", line 228, in open_write
    fh = self.backend.open_write(key, meta_raw)
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/local.py", line 102, in open_write
    dest = ObjectW(tmpname)
  File "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/local.py", line 304, in __init__
    self.fh = open(name, 'wb', buffering=0)
OSError: [Errno 14] Bad address: '/home/jlippuner/cloud_storage/ACD/remote/s3ql_backup/s3ql_data_/698/s3ql_data_69888#7693-140501390976768.tmp'

I wonder how hard it would be to add a native ACD interface to S3QL...

Best,
Jonas

Nikolaus Rath

unread,
Oct 26, 2016, 3:04:25 PM10/26/16
to s3...@googlegroups.com
On Oct 26 2016, Jonas Lippuner <jo...@lippuner.ca> wrote:
> I'm using a similar setup. Mount ACD locally with acd_cli and then use an
> s3ql volume inside the mounted ACD path. The reason for using ACD rather
> than S3 is very simple: cost.

Well, that explains it. You get what you pay for :-).

[...]
> Here's the backtrace (from mount.log) of the error I usually see:
[...]

> "/usr/local/lib/python3.5/dist-packages/s3ql-2.20-py3.5-linux-x86_64.egg/s3ql/backends/local.py",
> line 304, in __init__
> self.fh = open(name, 'wb', buffering=0)
> OSError: [Errno 14] Bad address:
> '/home/jlippuner/cloud_storage/ACD/remote/s3ql_backup/s3ql_data_/698/s3ql_data_69888#7693-140501390976768.tmp'

Yes, that's ACD being buggy.

> I wonder how hard it would be to add a native ACD interface to S3QL...

Some people are trying, see https://github.com/s3ql/s3ql/pull/7

Mike Beaubien

unread,
Nov 2, 2016, 10:20:05 AM11/2/16
to s3ql
Well, I thought I would respond back to this so people don't waste their time.

I'd recommend not using s3ql on top of acd_cli, it's too unstable. I found myself with a crashed file system and having to run fsck everyday. My reasons for wanting to use s3ql were:

1. The chunked storage format I thought would be better for doing things like resuming movies in the middle on my plex server
2. The encryption it provides makes me feel like a sneaky secrete agent.

Instead, I recommend you just upload your movies normally and mount using acd_cli.

For #1, chunking the files doesn't seem necessary. I don't know what black magic the acd_cli mount is doing, but my movies resume from the middle just fine, and I'd actually say that they play a little faster, s3ql seems to add some overhead.

I do miss out on the encryption, but I really don't think amazon cares about what I upload. The other thing you need to consider with encryption and s3ql in general is that you're locking yourself in to a particular technology. Right now I'm running plex on linux, but if I ever decide to buy a windows license so I can watch netflix on it, I'd have to re-download all my movies then re-upload them in some new format. Finally, if encryption is really important to you, there's a pull request for supporting encryptfs with acd_cli that seems a lot more mature then the pull request here for supporting s3ql.

Some more details on what I ended up doing:

1. I am still using union-fs. I keep everything on the local disk until I need to make some room, then I archive stuff to acd. It's sorta like a poor mans version of the disk cache s3ql provides.
2. rclone is amazing. Instead of using acd_cli to upload which I find to be a little slow, I use rclone. I actually got banned for a little bit from my vps until I throttled it because I was uploading at 50-80 MBps (capital B is intentional!), which was most of their 1Gbps pipe. Now I upload at 20MBps consistently. Rclone also seems a little bit more stable then acd_cli, although their mount stuff is not currently as good.

Anyways, the caching + encryption stuff that s3ql does is cool and I might revisit if it ever gets official support. It would be nice if it did because then I could remove all my unionfs and acd_cli stuff and just use s3ql. But what I have right now is good enough for my needs. I've had 1 file system crash in the last 2 weeks, and recovery from that is a lot faster since I don't have to run fsck.

I know a lot of people are trying to get this combo working, so hopefully my post helps.

Riku Bister

unread,
Nov 6, 2016, 6:39:20 AM11/6/16
to s3ql
I must recommend do not use s3ql on top acd_cli

But i have a solution to this, im still using s3ql on amazon cloud drive, its abit complicated. but it works. if people interested i can do a tutorial how to setup it.
atm i have 2-3Tb in amazon drive via s3ql and i can stream full hd or even a 4K movie where bitrate is over 5k. its super fast usage. it can be done, but there is alot caution needed on use it.(on mounting systems watch out wipes etc)
explanation about how movie stream is passing: amazon_acd_cli -> s3ql -> server where is owncloud and s3ql -> WEBDAV -> Kodi on my home connect via webdav to server -> TV 4k -> enjoy

explanation what i have done in short:
using combination of unionfs, s3ql with encryption, owncloud as frontend, and alot of scripts to automate everything it works, it is dirty but works, and it is fast as hell :D

only bad on this setup is if server crash the s3ql.fsck takes more than 2 hours on a single fs, i have 3 fs: you always want split fs:s in case one breaks then you don't lost everything

nikhilc

unread,
Mar 3, 2017, 1:05:34 PM3/3/17
to s3ql
On Saturday, October 15, 2016 at 9:29:25 AM UTC-5, Mike Beaubien wrote:
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 245, in _read_meta
    buf = fh.read(9)
OSError: [Errno 70] Communication error on send

It's probably just some temporary error for whatever network reason. Is there anyway to get s3ql to ignore and retry on these errors?

The recent S3 outage seemed to trigger this error much more frequently when using s3ql with acd_cli and overlayfs.  Out of curiosity and for testing purposes, I added a retry loop to local.py, replacing buf = fh.read(9):

Line 20:
import time

Line 242:
def _read_meta(fh):
    while True:
        try:
            buf = fh.read(9)
        except OSError as e:
            if e.errno in (os.errno.ECOMM, os.errno.EFAULT):
                log.info('OSError: %s, retrying' % e)
                time.sleep(1)
                continue
        break

ECOMM error 70 "Communication error on send" (more frequent) and EFAULT error 14 "Bad address" (rare) are the only errors I've seen so far using s3ql and acd_cli together.  The 1s sleep interval was to prevent hammering ACD but in practice it hasn't come into play - logs show the retries occurring on wider intervals.  It'd be more ideal to use the retry code that already exists in s3ql with the exponentially increasing interval.

Very crude, but so far it's been working well - the filesystem has yet to crash after a few days of testing.  As of lately while streaming video, the retry occurs at varying intervals - typically 20-45min apart, rarely 10-30s apart for a couple of minutes.  Playback is fine with infrequent retries, but will buffer briefly if retries are repeated within a few minutes and continue normally after buffering (rare).

A real ACD backend would be ideal but for now this has prevented crashes and having to wait a couple of hours for fsck to complete.  On a side note, there is one advantage to this combination of s3ql, acd_cli, and overlayfs with Plex - media files can be stored locally first via the upper layer of overlayfs and given time for Plex to perform deep analysis of the files.  The files can be uploaded at will once the analysis is complete - the analysis would take much longer and chew up bandwidth if the files were immediately stored on ACD.

For those willing to experiment, hope this helps!
-Nikhil

Riku Bister

unread,
Mar 31, 2017, 3:56:18 AM3/31/17
to s3ql

Nikhil, Man you made a valuable modification, this fixed at least on first try the crash.
got right away on log file retrying. if i was using original s3ql without modification on that case it was about to crash and fsck for a hour lol.
is there a way to improve log.info('OSError: %s, retrying' % e) ? to be more debugged, like what file is the problem etc?
This really done a temporary fix. thanks very much. already tried ask something like this on my previous post relating on local:///, but dev dumped right away it.

this could be improved even more, i have a script when i upload it checks if acd_cli it still mounted, (yes it crash rarely, but it does some times) this could be implemented on that part to check if that directory exists and remount it on demand. it could make the reading fs even more reliable.

Riku Bister

unread,
Apr 2, 2017, 3:55:11 AM4/2/17
to s3ql

after testing some time it is yes way better, like 100x better. but there became new problems. ofc i expected this to came. :)
We need find now where we can do a retry on HMAC, it is totally random, and when that comes first time the filesystem becames read only, after first HMAC message comes there cannot be written on filesystem until run fsck and remount it. there is nothing more coming onto log. i was running intensity media metacheck on drive. well it seems i cant do it. ALL IS Releated read problems, it just should retry when get bad data but it has not doing it, it just make problems or crash. there is none error chekking on the code atm.

2017-04-02 00:31:20.311 2188:fuse-worker-11 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 00:33:52.621 2188:fuse-worker-29 s3ql.fs._readwrite: Backend returned malformed data for block 13 of inode 10785 (HMAC mismatch)
2017-04-02 00:39:44.771 2188:fuse-worker-4 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 10782 (HMAC mismatch)
2017-04-02 00:40:50.619 2188:fuse-worker-3 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 00:41:01.669 2188:fuse-worker-3 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:07:34.093 2188:fuse-worker-16 s3ql.fs._readwrite: Backend returned malformed data for block 16 of inode 10773 (HMAC mismatch)
2017-04-02 01:10:36.720 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:10:49.613 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:11:01.563 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:11:31.853 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:36:53.096 2188:fuse-worker-12 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 6530 (HMAC mismatch)
2017-04-02 01:42:52.386 2188:fuse-worker-27 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:42:58.635 2188:fuse-worker-27 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 01:53:09.669 2188:fuse-worker-8 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:42:53.766 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:04.944 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:20.940 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:43:53.985 2188:fuse-worker-22 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:51:18.443 2188:fuse-worker-27 s3ql.fs._readwrite: Backend returned malformed data for block 7 of inode 7141 (HMAC mismatch)
2017-04-02 02:56:49.378 2188:fuse-worker-3 s3ql.fs._readwrite: Backend returned malformed data for block 15 of inode 7129 (HMAC mismatch)
2017-04-02 02:58:07.603 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 02:58:16.970 2188:fuse-worker-20 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 03:13:59.453 2188:fuse-worker-30 s3ql.fs._readwrite: Backend returned malformed data for block 0 of inode 7164 (HMAC mismatch)
2017-04-02 03:22:43.542 2188:fuse-worker-15 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 03:59:14.248 2188:fuse-worker-4 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:02:26.486 2188:fuse-worker-23 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:04.004 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:10.287 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:27.620 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:14:49.440 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:15:24.850 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:17:07.127 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:19:13.815 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:20:07.171 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying

-------------
Also there is another crash came 70 OSerror but this is releated on mount.py or something else

2017-04-02 04:20:07.171 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 04:23:01.560 2188:fuse-worker-16 s3ql.backends.local._read_meta: OSError: [Errno 70] Communication error on send, retrying
2017-04-02 05:45:45.014 12311:Metadata-Upload-Thread s3ql.mount.run: File system unchanged, not uploading metadata.
2017-04-02 05:45:46.216 12370:Metadata-Upload-Thread s3ql.mount.run: File system unchanged, not uploading metadata.
2017-04-02 05:46:19.323 2188:fuse-worker-10 llfuse.(unknown function): handler raised <class 'OSError'> exception ([Errno 70] Communication error on send), terminating main loop.
2017-04-02 05:46:19.781 2188:MainThread s3ql.mount.unmount: Unmounting file system...
2017-04-02 05:46:19.911 2188:MainThread root.excepthook: Uncaught top-level exception:

Traceback (most recent call last):
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/block_cache.py", line 744, in _get_entry

    el = self.cache[(inode, blockno)]
KeyError: (8619, 0)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/s3ql/s3ql-2.21_modattu/bin/mount.s3ql", line 26, in <module>
    s3ql.mount.main(sys.argv[1:])
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/mount.py", line 213, in main
    llfuse.main(workers)
  File "src/fuse_api.pxi", line 343, in llfuse.main (src/llfuse.c:35968)
  File "src/handlers.pxi", line 322, in llfuse.fuse_read (src/llfuse.c:14852)
  File "src/handlers.pxi", line 323, in llfuse.fuse_read (src/llfuse.c:14801)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/fs.py", line 1043, in read

    tmp = self._readwrite(fh, offset, length=length)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/fs.py", line 1120, in _readwrite

    with self.cache.get(id_, blockno) as fh:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/block_cache.py", line 722, in get
    el = self._get_entry(inode, blockno)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/block_cache.py", line 779, in _get_entry
    backend.perform_read(do_read, 's3ql_data_%d' % obj_id)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/common.py", line 108, in wrapped
    return method(*a, **kw)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/common.py", line 319, in perform_read
    res = fn(fh)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/block_cache.py", line 768, in do_read
    shutil.copyfileobj(fh, el, BUFSIZE)
  File "/usr/lib/python3.5/shutil.py", line 73, in copyfileobj
    buf = fsrc.read(length)
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/comprenc.py", line 594, in read
    buf = self._read_and_decrypt(size - len(outbuf))
  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/comprenc.py", line 549, in _read_and_decrypt
    buf = self.fh.read(size)

nikhilc

unread,
Apr 3, 2017, 1:32:34 PM4/3/17
to s3ql
Hi Riku,

On Sunday, April 2, 2017 at 2:55:11 AM UTC-5, Riku Bister wrote:

We need find now where we can do a retry on HMAC, it is totally random, and when that comes first time the filesystem becames read only, after first HMAC message comes there cannot be written on filesystem until run fsck and remount it.

I haven't seen this error in my usage - it may be worth creating a new filesystem to verify if the problem is actually a file transfer issue or an issue with your existing filesystem.

Also, you seem to be getting much more frequent communication errors than I've experienced - do you have acd_cli configured to retry on errors?  Here's my config but to avoid derailing the thread refer to acd_cli support for this aspect:

acd_client.ini:
[transfer]
fs_chunk_size = 1310720
chunk_retries = 5
connection_timeout = 10
idle_timeout = 20

fuse.ini:
[read]
open_chunk_limit = 500
timeout = 10
 
Also there is another crash came 70 OSerror but this is releated on mount.py or something else
[...]

  File "/root/s3ql/s3ql-2.21_modattu/src/s3ql/backends/comprenc.py", line 549, in _read_and_decrypt
    buf = self.fh.read(size)
OSError: [Errno 70] Communication error on send

I ran across this as well and added a retry to comprenc.py.  The error hasn't show up again so for now this hack (and really, these are crude hacks) isn't tested - I've committed the changes at:  https://bitbucket.org/taligentx/s3ql

Hope this helps,
Nikhil

Riku Bister

unread,
Apr 4, 2017, 5:58:35 AM4/4/17
to s3ql
heya, thanks for answer. okey havent changed acd_cli setting. but... keep reading

Want to share what im doing now:
as the (HMAC mismatch errors) make the s3ql drive read only i decided to try more modern client (active project)
rclone mount, my errors went away from factor 100 to 1. been testing now couple days and tweeking,
i totally replaced the read-only acd_cli mount with rclone mount with this setting:
fusermount -u /mnt/amazon; rclone mount --read-only --allow-other --acd-templink-threshold 0 --stats 5s --buffer-size 256M -v remote:/ /mnt/amazon

one thing i can say, it is alot faster on seeking fsck, instead wait more than 30minutes, it can do a check now in 5minutes or so. file scanning improvement is huge. as acd_cli is badly outdated, this rclone has support natively to retry if bad file came. have been reading other threads on internet and it seems people have moved to it. look this topic: https://forum.rclone.org/t/best-mount-settings-for-streaming-plex/344
im testing and if this keeps working good im going to dump acd_cli. but i will keep still the option to go back acd_cli.
rclone mount has been up now about 24hours and 100gigabytes has transferred (as media metadata scanning) and not even single crash or error on s3ql

i bet rclone could(maybe) work also as read/write to s3ql, havent tried yet. mounting the s3ql data folder and keep the database on local fs. as acd_cli cant be use as r/w on s3ql. need to try this out later :)

Reply all
Reply to author
Forward
0 new messages