Signing and Trust

17 views
Skip to first unread message

finke...@gmail.com

unread,
May 21, 2013, 3:50:32 PM5/21/13
to learnin...@googlegroups.com
I have gone through all the threads containing signing information and the Technical specifications and have read that there is nothing built into LR for providing trust. So I was hoping to get the experts and other consumers of LR ideas for ideas on how
to do something like this. Hope this wasn't already covered.

signer, curator and submitter cannot be used for any type of trust because there is nothing restricting their values. The only thing we could use them for is slicing but since we can only slice by either identity or by any_tags it doesn't really help.

Up front we are not using our own node, just harvests and slices to get at the data.

So lets so I wanted to ingest all of Khan Academy's documents that had the keyword safety in it. Here is how i see it.
1) I would figure out where Khan Academy's key is, download it and trust it
2) Since I can only slice by one thing, I'll slice by any_tags=safety.
3) Loop through every record returned, comparing the key_owner in the digital signature because we can know it prior
   to the the ingest. This should filter out all documents except ones that are pretending to be Khan Academy(Since they used the   
   same key_owner as Khan)
4) Finally use LR Signature to verify the document against our trusted public keys to make sure no one is pretending to be Khan.
5) I now have a list of documents that I know are from Khan Academy and have the keyword safety. After possibly going through tens of thousands of documents

Does this sound like a reasonable solution, or is there a better way? Also I was just thinking that it would be really useful to be able
to slice by either the key_owner/ or  one of the identities along with any_tags to make this slice a little smaller then getting every document that contains safety.

If this has been covered before, Sorry for bringing it up again.

Dave Finke - NSDL

Steve Midgley

unread,
May 24, 2013, 6:54:33 PM5/24/13
to learnin...@googlegroups.com
No apologies necessary - it's an important issue. When you're talking about trust, I think you're talking about determining which envelopes you "trust" in that you are going to processing them for importing into your system? You want to be sure you have reliable metadata from trusted publishers? Is that right?

The process today is as you describe. You single out publishers and manually trust their keys somewhere in your import pipeline (aka build a whitelist of trusted keys).

The next step of your question is how to extract trusted data from LR (steps 3-5). Step 3 and 4 aren't quite right. I think it would look like this:

3) Filter out all records that don't have a signature matching my whitelist keys or a specific key if I'm just importing for one publisher
4) Validate every one of the remaining envelopes using LR Signature, throwing away any whose validation fails

I think you want to do it in this order b/c LR Signature is much more computationally intensive than just doing text matches against envelope strings, so throw away as many records as possible before signature validating the remainder.

I think there is a better way to handle what you're describing in step 2, by using tools beyond slice, which is (as you note) limited in functionality. Walt has a nice extractor tool chain built up that offers more control over what you're pulling (and some worker queue stuff to manage it). I think this is the repo: https://github.com/wegrata/lr-data

If it looks interesting let me know and we can draw Walt into the conversation.. I think the main pain point for you is that your extraction capability from LR is limited by slice and that's probably too basic a tool for what you want to do? Thoughts?

Steve

p.s. Regarding whitelists of keys, I think what we want to get to is where orgs are publishing lists of whitelist of keys they trust, and signing that as an envelope. This way, I could import your whitelist to use as a starting place for mine. This serves as a nice discovery mechanism of publishers, as well as creating a "web of trust" where I'm trusting "friends of friends."

We don't have a way to do that today but I think the first step would be creating a metadata format for sharing FoaF data that includes PGP keys. Doesn't seem hard. We could hack this into schema.org by just doing something like: 

Person.name = "Steve Midgley
Person.knows = "NSDL" 

and then new record:
Person.name = "NSDL"
Person.url = "[url to PGP key]"

"knows" isn't saying the same as trust but if you put "knows" in an envelope it seems fairly clear what's being communicated? Or we could profile a new property into schema org "trusts" and do it (less standard but more accurate/clear) way?




--
You received this message because you are subscribed to the Google Groups "Learning Registry Developers List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningreg-d...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

finke...@gmail.com

unread,
May 28, 2013, 12:20:43 PM5/28/13
to learnin...@googlegroups.com
Steve,
   Thanks for your reply and suggestions, I took a look at Walts code and unfortunately I'm not positive that will work for us, since its based it off date and harvesting all the data. NSDL will probably be very selective on what data we pull into our system, So lets say we pull khan today. Then in a few months we might decide Navnorths data is solid enough to import into our system so we might grab that. Doing it by date and having to go through every record to find the matching ones might not be feasible on our end but I'll double check with John.

What I think would be so useful and could make a great UI later on is if LR either created another mandatory field called organization, or just based it off the digital signature key owner(Just something that we know probably won't change very often). Then either add that onto the index for slice or allow listRecords/harvest to filter by that field. Then consumers can just grab one publishers data. Then in the future someone could write a nice UI that is just a drop down of all the key owners  or organization (Depending on which way you would want to go). Doing this would allow everyone to see who the publishers and their data. As a side note we could currently do this with identity but those fields are just to wide open and there are 4 of them.

Also we need to use Java to do our data harvesting so we are unable to use LRSignature. I based our envelope verification code off of Navnorths use of Bouncy Castle. Here is a link if others want to see it https://github.com/navnorth/LRJavaLib/blob/master/src/com/navnorth/learningregistry/LRVerify.java. It doesn't match LRSignature yet for the hashing the of the envelope but I'm trying to go step by step through LRSignature to find what I'm missing.

thanks
Dave Finke - NSDL

Steve Midgley

unread,
May 28, 2013, 1:27:08 PM5/28/13
to learnin...@googlegroups.com
Got it.

I think then using the API layers provided by InBloom's Learning Registry Index project would probably be more useful to you, though that code is not OSS yet. I'll ping Jason Hoekstra and see if he has info on when that code might be made available to you for development.. 

Steve

Jim Klo

unread,
May 28, 2013, 1:56:43 PM5/28/13
to <learningreg-dev@googlegroups.com>
Steve,

Which part of it are you referring to - their LR Connector or the actual LRI-b API?  LR Connector (the bridge between LR and LRI) is just a customization on top of Walt's work AFAIK - so It's still Python, and they are using LRSignature as well. Their browser was recently moved into production: http://browser.inbloom.org/

- JK

On May 28, 2013, at 10:27 AM, Steve Midgley <steve....@mixrun.com>
 wrote:

Steve Midgley

unread,
May 29, 2013, 10:15:05 AM5/29/13
to learnin...@googlegroups.com
Thanks. Is the code underlying browser.inbloom.org open yet? I think that's what Dave would need to leverage that technology? I think he might find the LR connector useful, but as you say since it's python maybe not.. Just trying to point out the options and thanks for clarifying.

Steve

Jim Klo

unread,
May 29, 2013, 11:19:21 AM5/29/13
to <learningreg-dev@googlegroups.com>
I'm told the code is coming, but no date yet AFAIK. What I understand is that it will be released without the inBloom identity requirement, but not entirely positive. 

- JK

Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages