Hi,
I've just watched Brad and Mathieu's LinuxFest Northwest 2018 talk on
Perkeep,
https://youtu.be/PlAU_da_U4s and have a couple of questions.
Say a `pk get $hash1' shows some Perkeep metadata as JSON. If I were to
`pk put' some text that was valid JSON Perkeep metadata then I assume
Perkeep initially treats it as if it were genuine when re-building the
index from just the blobs. Can problems be caused by it being faulty
metadata, e.g. an incorrect schema, or referring to blobs that don't
exist? If not, because those problems are ignored on the assumption it
wasn't real Perkeep-authored metadata after all, that would mean genuine
problems, e.g. caused by a bug, might go undetected at this stage?
I understand the rolling-checksum deduplication that Perkeep already
does. Are the resulting 0-16 MiB blobs ever compressed when stored?
Has any thought been given to deduplication at other granularities?
Given,
foo.png
bar.pdf has foo.png within it
xyzzy.mbox has a base64'd bar.pdf within it
xyzzy.mbox.gz is exactly a gzip'd xyzzy.mbox
all.tar has all the above
it's conceivable that some background process can continually look over
the blobs for dedupe opportunities. Is this something that could fit in
with Perkeep's model, or does the default lack of blob deletion (for
good reasons) get in the way?
Lastly,
https://perkeep.org could benefit from having an up to date
`here's some of the things you could use it for' on the front page.
I can find
https://perkeep.org/doc/uses but it's probably out of date
and doesn't touch on the tantalising interesting answers in the Q&A at
the end of the presentation. Read from three PKs, write to the one with
space, etc. An up to date list of importers would be good too as many
might arrive with a social-media site in mind, e.g. Google+ given it's
declared demise. Why I'd be interested in using it could be better sold
on the first page I reach.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy