pkg/blobserver/diskpacked: Implement proper delete

39 views
Skip to first unread message

tgulacsi78

unread,
Jul 26, 2021, 5:23:10 AM7/26/21
to Perkeep
Hi,

I'd like to implement proper deletion for diskpacked blobserver,
to be able to use for more volatile use cases (for example, the "loose" storage of blobpacked, not just for the "packed").

As in diskpacked a bunch of blobs is stored contiguously in one big file (pack),
the deletion is not straightforward. Now we just overwrite the head (the hash)
with "0000" and the data with zeroes, and skip such blobs on reindex.
But this does not  free up space.

My idea is to append the remaining blobs to the storage when some threshold of deleted blobs is reached in that particular .pack file.

Questions:
1. When should we do such garbage collection?
  In RemoveBlobs? Or only on Reindex?
2. What should be the threshold?
  50% of pack file size seems acceptable, with a minimum of some tens of MiBs.

Problems:
1. Today the code assumes that the pack files are numbered sequentially, without gaps. Either
  a) we have to leave a 0-length pack file in place of the garbage collected,
  b) or rewrite in-place (dance with temp file and rename),
  c) rewrite the code to give up those assumptions, allow holes.

2. To know when to GC a pack file, we have to index the deleted blobs' places and sizes, too.
  Or at least maintain the deleted ration per pack.

Ideas, suggestions, oppositions?

Thanks in advance,
Tamás
Reply all
Reply to author
Forward
0 new messages