No apologies necessary - it's an important issue. When you're talking about trust, I think you're talking about determining which envelopes you "trust" in that you are going to processing them for importing into your system? You want to be sure you have reliable metadata from trusted publishers? Is that right?
The process today is as you describe. You single out publishers and manually trust their keys somewhere in your import pipeline (aka build a whitelist of trusted keys).
The next step of your question is how to extract trusted data from LR (steps 3-5). Step 3 and 4 aren't quite right. I think it would look like this:
3) Filter out all records that don't have a signature matching my whitelist keys or a specific key if I'm just importing for one publisher
4) Validate every one of the remaining envelopes using LR Signature, throwing away any whose validation fails
I think you want to do it in this order b/c LR Signature is much more computationally intensive than just doing text matches against envelope strings, so throw away as many records as possible before signature validating the remainder.
I think there is a better way to handle what you're describing in step 2, by using tools beyond slice, which is (as you note) limited in functionality. Walt has a nice extractor tool chain built up that offers more control over what you're pulling (and some worker queue stuff to manage it). I think this is the repo:
https://github.com/wegrata/lr-data
If it looks interesting let me know and we can draw Walt into the conversation.. I think the main pain point for you is that your extraction capability from LR is limited by slice and that's probably too basic a tool for what you want to do? Thoughts?
Steve
p.s. Regarding whitelists of keys, I think what we want to get to is where orgs are publishing lists of whitelist of keys they trust, and signing that as an envelope. This way, I could import your whitelist to use as a starting place for mine. This serves as a nice discovery mechanism of publishers, as well as creating a "web of trust" where I'm trusting "friends of friends."
We don't have a way to do that today but I think the first step would be creating a metadata format for sharing FoaF data that includes PGP keys. Doesn't seem hard. We could hack this into
schema.org by just doing something like:
Person.name = "Steve Midgley
Person.knows = "NSDL"
and then new record:
Person.name = "NSDL"
Person.url = "[url to PGP key]"
"knows" isn't saying the same as trust but if you put "knows" in an envelope it seems fairly clear what's being communicated? Or we could profile a new property into schema org "trusts" and do it (less standard but more accurate/clear) way?