Still more questions

26 views
Skip to first unread message

Jerome Grimmer

unread,
Apr 30, 2012, 4:47:19 PM4/30/12
to learnin...@googlegroups.com

Our group just keeps coming up with more questions to ask you guys…

 

1.       For tagging resources, are there standard keywords that are being used?

2.       Most of us on my team are coming from an RDBMS background.  It appears to us that the Learning Registry is a lot like a transaction log in an RDBMS.  Is there already code existing or a service existing that can walk through the “transaction log” for a resource and create a record that summarizes all the transactions, so we don’t have 50 organizations all developing the same thing?

3.       When you search the Learning Registry, does the data return include both metadata and paradata?

4.       Is there a way to only search metadata? How?

5.       Is there a way to only search paradata? How?

6.       Is there a way to get all metadata and paradata for a specific resource? How?

7.       If there is a change to the specification that affects API calls, how are other developers notified of this change?

 

Thanks,

 

Jerome Grimmer

Southern Illinois University Carbondale

2450 Foundation Drive Suite 100

Springfield, IL

Phone: 217-786-3010 ext. 5857

Toll-free: 1-800-252-4822 ext. 5857

jgri...@illinoisworknet.com

"Your words have power.  Use them wisely." --Unknown.

 

Jim Klo

unread,
Apr 30, 2012, 6:30:33 PM4/30/12
to <learningreg-dev@googlegroups.com>
Keep the questions coming!

Answers inline below:

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International

On Apr 30, 2012, at 1:47 PM, Jerome Grimmer wrote:

Our group just keeps coming up with more questions to ask you guys…
 
1.       For tagging resources, are there standard keywords that are being used?

Not really, this is a 'crowd sourced' feature.  So deciding what goes into keys should be discussed between you and the community you want to interact with.  As we were trying to discover what is there, we built some stats on our node at SRI: http://learnreg1.sri.com:5984/resource_data/_design/lr-stats/index.html which shows the top weekly results for keys that exist (keys has some artificial exclusions like grade level and schema formats that has been omitted from the count).

2.       Most of us on my team are coming from an RDBMS background.  It appears to us that the Learning Registry is a lot like a transaction log in an RDBMS.  Is there already code existing or a service existing that can walk through the “transaction log” for a resource and create a record that summarizes all the transactions, so we don’t have 50 organizations all developing the same thing?

Not sure I completely follow, because resource data documents are immutable in the current implementation (albeit not fully immutable per spec; deletes & status could cause mutability, but that is currently not implemented) I can somewhat see what you mean by a transaction log.  Data Services (not defined in the spec, but see here: http://jimklo.github.com/LearningRegistry/data-services/index.html will be included in the next release.  There are some sample data services for performing standards alignment, which could be tweaked for individual use. 

Since there is no central control for Learning Registry, Coordinating duplication of work efforts would have to be done via the community (probably the -dev or -collaborate lists)

3.       When you search the Learning Registry, does the data return include both metadata and paradata?

There is no "search" for Learning Registry.  There is a service called "slice" which does "search like things" but is not a search, in that it really supports only a few boolean operations which are crude at best:
• any "tags" (tags being a combination of identity fields, schema format, and keys, I believe)
• any tags + date range
• any tags + identity (identity being an aggregate term for any identity field; publisher, curator, owner, etc)
• any tags + identity + date range
• identity
• identity + date range

And yes, slice does return both metadata and paradata.  OAI-PMH is the only odd-man out here: it only returns XML schema formats; to my knowledge the only XML format for paradata that data exists for is NSDL's comm_para.  Most are using LR Paradata 1.0, which models ActivityStreams.

Steve Midgley organized a recent HackDay to try out Amazon's new CloudSearch which may be a viable option for providing a richer search on top of resource data.  Also Agilix has done a proprietary solution to add LR resource data into their Solr indexes for use in their commercial offering. ADL has also tinkered with ElasticSearch integration: http://www.elasticsearch.org/ with some success as well.  The challenge for all of these solutions is not so much in the indexing of the resource data documents themselves but of the resource data document payloads.

4.       Is there a way to only search metadata? How?
5.       Is there a way to only search paradata? How?

These kind of belong together. Navigation North has made a modification to slice to be able to slice on paradata; I'm not sure if their enhancement is publicly available or not, and not sure if it's restricted to filtering metadata/paradata.  Additionally ADL has their lr-data tool; https://github.com/adlnet/lr-data/; which would let you extract LR Contents into other systems (I think they have support for SQL RDMBS, MongoDB, [Lucene/Solr? not sure], and possibly a few others).

6.       Is there a way to get all metadata and paradata for a specific resource? How?

Yes, most all services provide a means of telling the api if the request_id is a resource_locator or a doc_ID.  Data Services uses the notion of resources vs discriminators - see the documentation here: http://jimklo.github.com/LearningRegistry/data-services/index.html  to understand the differences. 

7.       If there is a change to the specification that affects API calls, how are other developers notified of this change?

Each service has a service document that is tied to each which contains the service version, the document version will updated when the implemented API changes. Currently there haven't been any changes to the spec that would affect the API calls.  There is a service implementation change for distribute which will be included in the next release that does contain a significant change in the design to cause the implementation to better align with the spec.  Otherwise most changes will be discussed on the -dev list or on the public design calls (which occur bi-weekly on Thursdays).


Keep asking questions!

- Jim

 
Thanks,
 
Jerome Grimmer
Southern Illinois University Carbondale
2450 Foundation Drive Suite 100
Springfield, IL
"Your words have power.  Use them wisely." --Unknown.
 

-- 
---
This message is posted from the Google Groups "Learning Registry Developers List" group. 
To post: learnin...@googlegroups.com
To unsubscribe: learningreg-d...@googlegroups.com

Midgley, Steve

unread,
Apr 30, 2012, 7:43:58 PM4/30/12
to learnin...@googlegroups.com
Adding some comments to Jim's.. Your questions are right on the money - these are the key questions for implementers. Please keep posting if we aren't getting you the right answers.
 
Steve
 

From: learnin...@googlegroups.com [learnin...@googlegroups.com] On Behalf Of Jim Klo [jim...@sri.com]
Sent: Monday, April 30, 2012 6:30 PM
To: <learnin...@googlegroups.com>
Subject: Re: [Learningreg-Dev] Still more questions

Keep the questions coming!

Answers inline below:

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International

On Apr 30, 2012, at 1:47 PM, Jerome Grimmer wrote:

Our group just keeps coming up with more questions to ask you guys…
 
1.       For tagging resources, are there standard keywords that are being used?

Not really, this is a 'crowd sourced' feature.  So deciding what goes into keys should be discussed between you and the community you want to interact with.  As we were trying to discover what is there, we built some stats on our node at SRI: http://learnreg1.sri.com:5984/resource_data/_design/lr-stats/index.html which shows the top weekly results for keys that exist (keys has some artificial exclusions like grade level and schema formats that has been omitted from the count).
 
SM: Keywords is a pretty open territory right now. Where people have keywords from their existing metadata payloads, they generally put them in here as well. Keep in mind for (virtually) every envelope there is a metadata payload which usually has much more detail (more on this below).

2.       Most of us on my team are coming from an RDBMS background.  It appears to us that the Learning Registry is a lot like a transaction log in an RDBMS.  Is there already code existing or a service existing that can walk through the “transaction log” for a resource and create a record that summarizes all the transactions, so we don’t have 50 organizations all developing the same thing?

Not sure I completely follow, because resource data documents are immutable in the current implementation (albeit not fully immutable per spec; deletes & status could cause mutability, but that is currently not implemented) I can somewhat see what you mean by a transaction log.  Data Services (not defined in the spec, but see here: http://jimklo.github.com/LearningRegistry/data-services/index.html will be included in the next release.  There are some sample data services for performing standards alignment, which could be tweaked for individual use. 

Since there is no central control for Learning Registry, Coordinating duplication of work efforts would have to be done via the community (probably the -dev or -collaborate lists)
 
SM: I agree that LR is very similar to a transaction log and if you think of it that way, a lot of architecture makes sense, EXCEPT that it's a distributed transaction log, so we share activities from the t-log with every node that wants a copy.
 
SM: There is slice today which can give you some query-like services, but the Data Services module (plus possibly Cloud Search or things like it) are the way we think we'll solve this problem sustainably (slice has too many technical limitations). Also, Applied Minds, funded by Gates Foundation, is working on a "Learning Registry Index" which should offer some additional capabilities to do extractions of data (better to think about "slicing" or "extracting" than querying). Walt Grata on this list has developed some tools to pull the LR envelope into RDBMS columns in a traditional SQL database (Postgres maybe?), so that's a quick/dirty way to get stuff out of LR and into an environment where you can process effectively.
 
SM: LR is a transport solution to help us all share metadata with each other. Fundamentally that's what we built it for. But of course if you can't figure out what's in it, and you can't get the stuff you want out of it, it's not very useful, so clearly we need to help the community figure this out too.

3.       When you search the Learning Registry, does the data return include both metadata and paradata?

There is no "search" for Learning Registry.  There is a service called "slice" which does "search like things" but is not a search, in that it really supports only a few boolean operations which are crude at best:
• any "tags" (tags being a combination of identity fields, schema format, and keys, I believe)
• any tags + date range
• any tags + identity (identity being an aggregate term for any identity field; publisher, curator, owner, etc)
• any tags + identity + date range
• identity
• identity + date range

And yes, slice does return both metadata and paradata.  OAI-PMH is the only odd-man out here: it only returns XML schema formats; to my knowledge the only XML format for paradata that data exists for is NSDL's comm_para.  Most are using LR Paradata 1.0, which models ActivityStreams.

Steve Midgley organized a recent HackDay to try out Amazon's new CloudSearch which may be a viable option for providing a richer search on top of resource data.  Also Agilix has done a proprietary solution to add LR resource data into their Solr indexes for use in their commercial offering. ADL has also tinkered with ElasticSearch integration: http://www.elasticsearch.org/ with some success as well.  The challenge for all of these solutions is not so much in the indexing of the resource data documents themselves but of the resource data document payloads.
 
SM: When you slice or extract data, you are definitely getting a range of envelopes. You can control the range of envelopes with various criteria. Most common would be: sender, date, keyword and schema_format.
4.       Is there a way to only search metadata? How?
5.       Is there a way to only search paradata? How?

These kind of belong together. Navigation North has made a modification to slice to be able to slice on paradata; I'm not sure if their enhancement is publicly available or not, and not sure if it's restricted to filtering metadata/paradata.  Additionally ADL has their lr-data tool; https://github.com/adlnet/lr-data/; which would let you extract LR Contents into other systems (I think they have support for SQL RDMBS, MongoDB, [Lucene/Solr? not sure], and possibly a few others).
 
SM: Rather than think of metadata v paradata, think of things along the lines of the elements in the envelope I mentioned above (sender, date, keyword and schema_format) plus any of "signals" in the data you can reliably detect. One key concept is "what format is the payload in?" Generally speaking paradata and metadata may come in different schema formats (though sometimes they are mashed together in one dublin-core block for example). You want to extract only the formats (and possibly elements within the formats) that are of use to you.
 
SM: The data services stuff we are rolling out next week will let you write some pretty simple javascript statements that will help define things you want to find and let you extract just that data. So if you only want paradata, you can write an extract that just gives you paradata (and actually just the kinds of paradata you want).
6.       Is there a way to get all metadata and paradata for a specific resource? How?

Yes, most all services provide a means of telling the api if the request_id is a resource_locator or a doc_ID.  Data Services uses the notion of resources vs discriminators - see the documentation here: http://jimklo.github.com/LearningRegistry/data-services/index.html  to understand the differences. 
 
SM: Jim is referring to the differentce between an LR envelope ID and a web resource ID (aka URL) such as a khanacademy.org video. You need to use the data services extractions to pull out all the metadata associated with a specific web resource. We built data services to specifically deal with this issue: "Show me all the rating data associated with a specific URL." Or "Show me all the curricular standards alignments associated with a given URL." Etc..
7.       If there is a change to the specification that affects API calls, how are other developers notified of this change?

Each service has a service document that is tied to each which contains the service version, the document version will updated when the implemented API changes. Currently there haven't been any changes to the spec that would affect the API calls.  There is a service implementation change for distribute which will be included in the next release that does contain a significant change in the design to cause the implementation to better align with the spec.  Otherwise most changes will be discussed on the -dev list or on the public design calls (which occur bi-weekly on Thursdays).
 
SM: Also keep in mind that these services are not a unified system in the sense of a SQL server or whatever. You can run your own node, and replicate data to it from other LR nodes and then run all your own custom APIs to access the data. Then the only API that could break is the "distribute" function, which is fairly well locked down by our underlying implementation (called CouchDB) so we don't anticipate that changing very often and there will be a lot of noise if it does change.

Keep asking questions!
 
SM: Ditto!!

Jerome Grimmer

unread,
May 1, 2012, 4:02:01 PM5/1/12
to learnin...@googlegroups.com

Jerome Grimmer asked:

1.       For tagging resources, are there standard keywords that are being used?

 

Jim Klo wrote:

 

                                    Not really, this is a 'crowd sourced' feature.  So deciding what goes into keys should be discussed between you and the community you want to interact with.  As we were trying to discover what is there, we built some stats on our node at SRI: http://learnreg1.sri.com:5984/resource_data/_design/lr-stats/index.html which shows the top weekly results for keys that exist (keys has some artificial exclusions like grade level and schema formats that has been omitted from the count).

 

Jerome writes:

This is great.  Very helpful.  Do you have such a page for actions and actors as well?  We are interested in developing a standard vocabulary that will be widely understood, and knowing what is already being used is a big help to us.

 

Your other answers to my questions were also very helpful.  This will give me much to work through in the coming days.

 

Thanks!!

 

 

Jerome Grimmer

Southern Illinois University Carbondale

2450 Foundation Drive Suite 100

Springfield, IL

Phone: 217-786-3010 ext. 5857

Toll-free: 1-800-252-4822 ext. 5857

jgri...@illinoisworknet.com

"Everybody is a genius.  But if you judge a fish on its ability to climb a tree, it will live its whole life believing that it is stupid.” – Albert Einstein

Steve Midgley

unread,
May 1, 2012, 4:30:51 PM5/1/12
to learnin...@googlegroups.com
The closest I've seen to a proposed vocabulary in terms of framework is attached. This was created by Susan Van Gundy while she was at NSDL (and possibly collaborators).

For specific examples we have the Modeling Paradata document: https://docs.google.com/document/d/19ZkVpxQn1O1dLhCZClkQkzvypziBI7gBytszTxgXmX0/edit?hl=en_US

There are a ton of very concrete JSON examples that you can use there. Where you see an example that is close but doesn't meet your needs please write in. We can give you edit rights to that document so you can add your extensions directly - but please write to the list first to be sure we coordinate with others who may be working on similar stuff.

Best,
Steve


On 5/1/2012 1:02 PM, Jerome Grimmer wrote:

Jerome Grimmer asked:

1.�������For tagging resources, are there standard keywords that are being used?

�

Jim Klo wrote:

�

����������������������������������� Not really, this is a 'crowd sourced' feature. �So deciding what goes into keys should be discussed between you and the community you want to interact with. �As we were trying to discover what is there, we built some stats on our node at SRI:�http://learnreg1.sri.com:5984/resource_data/_design/lr-stats/index.html�which shows the top weekly results for keys that exist (keys has some artificial exclusions like grade level and schema formats that has been omitted from the count).

�

Jerome writes:

This is great.� Very helpful.� Do you have such a page for actions and actors as well?� We are interested in developing a standard vocabulary that will be widely understood, and knowing what is already being used is a big help to us.

�

Your other answers to my questions were also very helpful.� This will give me much to work through in the coming days.

�

Thanks!!

�

�

Jerome Grimmer

Southern Illinois University Carbondale

2450 Foundation Drive Suite 100

Springfield, IL

Phone: 217-786-3010 ext. 5857

Toll-free: 1-800-252-4822 ext. 5857

jgri...@illinoisworknet.com

"Everybody is a genius.� But if you judge a fish on its ability to climb a tree, it will live its whole life believing that it is stupid.� � Albert Einstein

Paradata Framework Proposal.ppt
Reply all
Reply to author
Forward
0 new messages