On 2020-07-09 at 04:25 +0200, Daniel Jagszent wrote:
> > [...] Has anybody tried to improve s3ql's caching mechanism to
> > allow partial
> > download of blocks?
> Not that I know of. Any such implementation should be rock solid with
> regards to data integrity (that has priority over performance for
> S3QL
> AFAIK) and should survive an OS crash at any point in time.
> Since blocks are (optionally) compressed and encrypted it's not that
> easy to discern the required byte range to receive from the object
> storage…
Ah yes, compression and probably encryption will indeed preclude any
sort of partial block caching. An implementation will have to be
limited to plain uncompressed blocks, which is okay for my use-case
though (borg provides its own encryption and compression anyway).
Regarding stability, I don't see why such a cache would be inherently
unstable -- it adds complexity, yes, but so does everything else.
I was thinking of doing something on top of sparse files:
- every partially cached block gets its own sparse file
- every sparse file gets a map file that records which data is present
in form of a list of ranges
- sparse files get a different naming schema (e. g. ${blockid}.partial)
to prevent S3QL ever mistaking a partially downloaded block for a fully
cached one, and map files as well (e. g. ${blockid}.map)
- application reads are compared against the existing map to determine
if new data has to be downloaded
- backend reads are probably aligned down and rounded up to minimal
viable I/O size (configurable), then passed through as partial
downloads
- the sparse file is allocated if it does not exist (open, truncate),
the range is downloaded into the file as is (seek, write) and the file
is fsynced
- a new map file is created (${blockid}.
map.new), new ranges are
serialized and the map file is atomically renamed to ${blockid}.map
Naturally, there has to be some logic to determine if it is worthwhile
to perform a partial download at all, as well as logic to promote
partial blocks to full blocks.
Does this all sound plausible?