I've just opened a WIP PR against Perkeep in the hope of motivating a design discussion (about local blob storage, max blob sizes, and perhaps other topics). Here's the PR description. Cheers!
This PR is a preliminary sketch for a new blobserver type that uses files uploaded to it as their own storage.
When you add a file to an fsbacked.Storage that's within the directory tree it controls, an entry is added to a database that maps between files and blobrefs; but the file's contents are not copied anywhere. When fetching the file's content blob later, the database directs the Storage to the right local file and the data is served from there.
Adding files outside the directory tree, or adding any other kind of blob, fails over to another blobserver nested inside the fsbacked.Storage.
This solves the problem of wanting to add a tree of large files (e.g., videos of my kids growing up) to a local Perkeep instance without storing all the data twice. This should be used only on directory trees whose files do not change, lest the blobrefs in the database become mismatched to their corresponding files.
A number of other changes throughout Perkeep would be needed to make this truly useful. The io.Reader presented to a blobserver's ReceiveBlob method is usually (always?) some wrapper object (like checkHashReader) that conceals the underlying *os.File, without which fsbacked.Storage cannot detect that a file within its tree is being uploaded. And in any case, Perkeep imposes rather a low limit on blob sizes for this purpose.
Presented for further discussion.
I guess the rolling hash produces small chunks by intention. The perkeep
source code mentions 16MB as maximum chunk size in some places.
Splitting the file is the intended behavior. It has some advantages
compared to just hashing a complete file.