Local S3QL host disk filled up

Chris Davies

unread,

Jun 17, 2019, 11:46:28 AM6/17/19

to s3ql

I'm running S3QL 3.0 (Debian) on a locally hosted filesystem, so no S3 involved this time.

Unfortunately over the weekend my host disk filled up. I've since extended the disk but I still can't get my S3QL filesystem to recover.

The original error was triggered through the mounted filesystem here:

2019-06-15 13:26:28.724 2293:Metadata-Upload-Thread s3ql.mount.run: Dumping metadata...
2019-06-15 13:26:29.542 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..objects..
2019-06-15 13:27:09.527 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..blocks..
2019-06-15 13:31:16.480 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..inodes..
2019-06-15 13:34:34.279 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..inode_blocks..
2019-06-15 13:36:28.091 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..symlink_targets..
2019-06-15 13:36:28.111 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..names..
2019-06-15 13:37:50.069 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..contents..
2019-06-15 13:40:28.757 2293:Metadata-Upload-Thread s3ql.metadata.dump_metadata: ..ext_attributes..
2019-06-15 13:43:58.038 2293:Metadata-Upload-Thread s3ql.metadata.upload_metadata: Compressing and uploading metadata...
2019-06-15 13:46:04.242 2293:Metadata-Upload-Thread s3ql.metadata.upload_metadata: Wrote 73.5 MiB of compressed metadata.
2019-06-15 13:46:04.244 2293:Metadata-Upload-Thread s3ql.metadata.upload_metadata: Cycling metadata backups...
2019-06-15 13:46:04.245 2293:Metadata-Upload-Thread s3ql.metadata.cycle_metadata: Backing up old metadata...
2019-06-16 06:12:54.030 2293:Thread-9 s3ql.mount.exchook: Unhandled top-level exception during shutdown (will not be re-raised)
2019-06-16 06:12:53.817 2293:Thread-5 root.excepthook: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 279, in perform_write
    return fn(fh)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 457, in do_write
    fh.write(buf)
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 370, in write
    self.fh.write(buf)
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 323, in write
    self.fh.write(buf)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/s3ql/s3ql/mount.py", line 58, in run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 445, in _upload_loop
    self._do_upload(*tmp)
  File "/usr/lib/s3ql/s3ql/block_cache.py", line 472, in _do_upload
    % obj_id).get_obj_size()
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 108, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 279, in perform_write
    return fn(fh)
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 389, in __exit__
    self.close()
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 380, in close
    self.fh.write(buf)
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 323, in write
    self.fh.write(buf)
OSError: [Errno 28] No space left on device
2019-06-16 06:12:54.179 2293:Thread-7 s3ql.mount.exchook: Unhandled top-level exception during shutdown (will not be re-raised)
2019-06-16 06:12:54.329 2293:Thread-9 root.excepthook: Uncaught top-level exception:

I have an automatic fsck that gets triggered whenever an S3QL mount fails (typically from "transport not connected" errors) and it fired even though this was a local disk:

2019-06-17 09:37:12.780 24686:MainThread s3ql.fsck.main: Starting fsck of local:///var/autofs/misc/s3ql/field/

2019-06-17 09:37:12.833 24686:MainThread s3ql.fsck.main: Using cached metadata.
2019-06-17 09:37:12.834 24686:MainThread s3ql.fsck.main: Remote metadata is outdated.
2019-06-17 09:37:12.834 24686:MainThread s3ql.fsck.main: Checking DB integrity...
2019-06-17 10:12:35.967 24686:MainThread root.excepthook: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/bin/fsck.s3ql", line 11, in <module>
    load_entry_point('s3ql==3.0', 'console_scripts', 'fsck.s3ql')()
  File "/usr/lib/s3ql/s3ql/fsck.py", line 1269, in main
    backend['s3ql_seq_no_%d' % param['seq_no']] = b'Empty'
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 197, in __setitem__
    self.store(key, value)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 307, in store
    self.perform_write(lambda fh: fh.write(val), key, metadata)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 108, in wrapped
    return method(*a, **kw)
  File "/usr/lib/s3ql/s3ql/backends/common.py", line 278, in perform_write
    with self.open_write(key, metadata, is_compressed) as fh:
  File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 274, in open_write
    fh = self.backend.open_write(key, meta_raw)
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 107, in open_write
    dest.write(b's3ql_1\n')
  File "/usr/lib/s3ql/s3ql/backends/local.py", line 323, in write
    self.fh.write(buf)
OSError: [Errno 28] No space left on device

I've now extended the underlying disk, but I'm being told that the S3QL filesystem is still mounted elsewhere. It isn't - I can guarantee this is the only system that has access to the host storage, because it's a local disk. If I run the fsck manually I get told the locally cached metadata is out of date, and asked to confirm I want to wipe it out. I really don't want to do that because I've got over 600MB still in the local cache waiting to be uploaded to the S3QL filesystem. The cache is on a different disk to the S3QL filesystem so I'm really not sure why the local cache is considered out of date with respect to the remote.

Help please!

Thanks,

Chris

Chris Davies

unread,

Jun 17, 2019, 11:55:55 AM6/17/19

to s3ql

Additional information. Since this is a locally hosted filesystem it's trivial to get the underlying list of files (objects). Here's what I see:

# ls -ltA
total 603056
-rw-r--r--   1 root root        0 Jun 17 10:12 s3ql_seq_no_11
-rw-r--r--   1 root root 77085524 Jun 15 13:46 s3ql_metadata
-rw-r--r--   1 root root 71856921 Jun 15 13:46 s3ql_metadata_bak_0
-rw-r--r--   1 root root 70460048 Jun 15 13:46 s3ql_metadata_bak_1
-rw-r--r--   1 root root 63798340 Jun 15 13:46 s3ql_metadata_bak_2
-rw-r--r--   1 root root 58605750 Jun 15 13:46 s3ql_metadata_bak_3
-rw-r--r--   1 root root 47143593 Jun 15 13:46 s3ql_metadata_bak_4
-rw-r--r--   1 root root 45975070 Jun 15 13:46 s3ql_metadata_bak_5
-rw-r--r--   1 root root 41974332 Jun 15 13:46 s3ql_metadata_bak_6
-rw-r--r--   1 root root 39478853 Jun 15 13:46 s3ql_metadata_bak_7
-rw-r--r--   1 root root 38954572 Jun 15 13:46 s3ql_metadata_bak_8
-rw-r--r--   1 root root 31040449 Jun 15 13:46 s3ql_metadata_bak_9
-rw-r--r--   1 root root 31040449 Jun 15 13:46 s3ql_metadata_bak_10
-rw-r--r--   1 root root      162 Jun 10 12:55 s3ql_seq_no_10
-rw-r--r--   1 root root      162 Jun 10 10:44 s3ql_seq_no_9
-rw-r--r--   1 root root      162 Jun  6 22:54 s3ql_seq_no_8
-rw-r--r--   1 root root      162 Jun  6 22:49 s3ql_seq_no_7
-rw-r--r--   1 root root      162 Jun  6 17:19 s3ql_seq_no_6
-rw-r--r--   1 root root      162 Jun  6 16:51 s3ql_seq_no_5
-rw-r--r--   1 root root      162 Jun  2 15:36 s3ql_seq_no_4
drwxr-xr-x 902 root root    36864 Jun  1 01:37 s3ql_data_
-rw-r--r--   1 root root      162 May 31 23:31 s3ql_seq_no_3
-rw-r--r--   1 root root      162 May 31 22:30 s3ql_seq_no_2
-rw-r--r--   1 root root      162 Feb 19 10:07 s3ql_seq_no_1

Chris

Nikolaus Rath

unread,

Jun 17, 2019, 2:55:47 PM6/17/19

to s3...@googlegroups.com

On Jun 17 2019, Chris Davies <cdro...@gmail.com> wrote:
> If I run the fsck manually I get told the locally cached metadata is
> out of date, and asked to confirm I want to wipe it out. I really don't
> want to do that because I've got over 600MB still in the local cache
> waiting to be uploaded to the S3QL filesystem. The cache is on a different
> disk to the S3QL filesystem so I'm really not sure why the local cache is
> considered out of date with respect to the remote.

Could you please copy & paste the exact fsck output?

Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Chris Davies

unread,

Jun 17, 2019, 5:05:00 PM6/17/19

to s3ql

On Monday, 17 June 2019 19:55:47 UTC+1, Nikolaus Rath wrote:

Could you please copy & paste the exact fsck output?

I thought I'd done that but here is the chunk in its entirety

2019-06-17 16:30:18.666 1585:MainThread s3ql.fsck.main: Starting fsck of local:///var/autofs/misc/s3ql/field/
2019-06-17 16:30:18.667 1585:MainThread s3ql.fsck.main: Ignoring locally cached metadata (outdated).
2019-06-17 16:32:44.304 1585:MainThread root.excepthook: Uncaught top-level exception:


Traceback (most recent call last):
  File "/usr/bin/fsck.s3ql", line 11, in <module>
    load_entry_point('s3ql==3.0', 'console_scripts', 'fsck.s3ql')()


  File "/usr/lib/s3ql/s3ql/fsck.py", line 1201, in main
    elif sys.stdin.readline().strip() != 'continue, I know what I am doing':
KeyboardInterrupt

At this point I bailed out - reasonably enough I hope.

I've been looking at your code (again) and the file timestamps, and it seems to me that removing s3ql_seq_no_11 might resolve the issue...?

Regards,
Chris

Nikolaus Rath

unread,

Jun 17, 2019, 5:22:29 PM6/17/19

to s3...@googlegroups.com

On Jun 17 2019, Chris Davies <cdro...@gmail.com> wrote:

> On Monday, 17 June 2019 19:55:47 UTC+1, Nikolaus Rath wrote:
>>
>> Could you please copy & paste the exact fsck output?
>>
>>
> I thought I'd done that but here is the chunk in its entirety
>
> 2019-06-17 09:37:12.780 24686:MainThread s3ql.fsck.main: Starting fsck of
> local:///var/autofs/misc/s3ql/field/

[...]

I don't mean the log file, I mean the message that is printed on the
console where you're "being told that the S3QL filesystem is still
mounted elsewhere.".

My best guess is that you're running fsck.s3ql / mount.s3ql with a
different --cachedir settings. Otherwise the messages that you report
don't make sense.

Chris Davies

unread,

Jun 23, 2019, 8:56:01 AM6/23/19

to s3ql

Hi Nikolaus,

On Monday, 17 June 2019 22:22:29 UTC+1, Nikolaus Rath wrote:

On Jun 17 2019, Chris Davies <cdro...@gmail.com> wrote:
> On Monday, 17 June 2019 19:55:47 UTC+1, Nikolaus Rath wrote:
I don't mean the log file, I mean the message that is printed on the
console where you're "being told that the S3QL filesystem is still
mounted elsewhere.".

My best guess is that you're running fsck.s3ql / mount.s3ql with a
different --cachedir settings. Otherwise the messages that you report
don't make sense.

Thank you for your suggestion. Unfortunately I no longer had the output of that, and I wasn't able to reproduce it.

Here's why. While I was waiting for your assistance I decided to take a parallel route. Because the S3QL filesystem in this instance was being hosted on a local disk I was able to take a VM snapshot of the entire system. I removed (renamed) the zero-length s3ql_seq_no_11 file and the fsck performed flawlessly. I now have my data - and I've been able to extend by host filesystem considerably. (If this hadn't worked I would have rewound the snapshot and continued here. Clearly not an option when using a real S3-based filesystem, though.)