Re: [Learning Registry: Collaborate] Metadata Terms of Service

49 views
Skip to first unread message

Steve Midgley

unread,
Jun 22, 2013, 2:54:00 AM6/22/13
to <learning-registry-collaborate@googlegroups.com>, learningregistry
Looping in the main Learning Registry list onto this important question because many folks on that list might not yet be uploading to LR but will (hopefully) have opinions on this issue. See John's question below. Here's my summary and some additional questions that seem related:

Would a non-commercial metadata license prevent use of NSDL content?
Would a clear license / price for commercial use specified make it easier / better for commercial consumers of content?
How specific would the definition of non-commercial need to be to make qualified organizations comfortable they "fit" under the term?

Steve


On Fri, Jun 21, 2013 at 7:24 PM, John Weatherley <jwea...@ucar.edu> wrote:
Hi All,

We've been discussing here at NSDL about the Terms of Service that we attach to our metadata/data submissions to the public LR node. Currently, NSDL's terms are for non-commercial use only, similar to http://creativecommons.org/licenses/by-nc/3.0/

The ToS guidelines for the public LR indicate, however, that all submissions should be placed under one of four recognized ToS, all of which allow for commercial uses (see http://www.learningregistry.org/tos).

I'd be curious to hear whether anyone has made submissions that restrict to non-commercial use only? From a philosophical and practice point of view, would/does this prevent others from using NSDL metadata in a widespread manner?

-john


------------------------------------------------------
John Weatherley
National Science Digital Library (NSDL)

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.
 
To post: learning-regis...@googlegroups.com
To unsubscribe:learning-registry-co...@googlegroups.com
 
For more options, visit this group at
http://groups.google.com/group/learning-registry-collaborate?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Joshua Marks

unread,
Jun 22, 2013, 11:35:08 AM6/22/13
to learning-regis...@googlegroups.com, learningregistry

All,

 

This is really tricky. You are talking about the rights to the metadata rather than the content being described in the metadata. (Meta-meta data?) It seems problematic to restrict sharing and use of metadata in an open system like LR. Also there is the issue of fair use, where others can create the same or similar metadata under different license or terms. My suggestion is to not restrict the metadata to non-commercial ever, even if the content itself is. Otherwise a single non-commercial record in the LR would prevent any commercial use of the learning registry.  Presently there is no mechanism to know the rights or limitations to metadata in the LR (Other than the TOS, which may not apply to other node managed by others), and by virtue of publishing should be considered usable for commercial and non-commercial use.

 

The other approach is to say LR is only non-commercial use, which would preclude for profit entities from using or integrating with the LR as there would be no way to filter out the non-commercial metadata for potentially commercial compatible content.

 

The other approach is to add another element to the LR record for the license to the LR record itself so a commercial entity can filter and exclude any non-commercial metadata or paradata record from their node, index or consuming application.

 

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

US 831-685-3511

 

I welcome you to become a member of the Curriki community, to follow us on Twitter and to say hello on our blogFacebook and LinkedIn communities.

Kelly Peet

unread,
Jun 27, 2013, 9:28:39 AM6/27/13
to learning-regis...@googlegroups.com, learningregistry
My input on the topic starts with some foundational questions:

1) Without having looked very deeply, a quick glance at ndsl.org indicates funding by NSF.  A quick glance as nsf.gov indicates funding by an act of congress, ultimately (and probably naively) concluding the content was created using public funds.  First question, how can a publicly-funded organization hold copyright to its original works?  The answer would seem to precede any discussion about metadata (or meta-meta data) copyright considerations.

NOTE: my question is born out of ignorance more than any attempt to be inflammatory.

2) Presuming copyright does prevail, would it violate non-commercial use if a for-profit company were to develop a product in which such metadata were given away freely, but for which the free product were a companion to a product or integrated into a product that makes a profit?

3) I guess a way to restate question #2 might be to inquire about fair use as it might relate to, say, search engines harvesting the published metadata for use in algorithms to improve the relevance of results that might point to the NDSL original works.  The advertising revenues of such for-profit search engines certainly violates the non-commercial aspects, but under fair use, is foundational to today's largest internet properties.  I guess I don't understand what is attempting to be protected in the meta (meta-meta) data here.
kelly

Steve Midgley

unread,
Jun 27, 2013, 2:19:15 PM6/27/13
to <learning-registry-collaborate@googlegroups.com>, learningregistry
Hi Kelly,

Good to hear from you. I'll share my perspective/knowledge, as best as I can. I believe what I'm saying is factually based, ymmv..

When the federal gov't funds a organization to build something, generally speaking, and unlike commercial work-for-hire arrangements, the IP created by that work is owned by the organization which creates it. (The army funds you to develop/build a tank, the Army gets the tank, you get the IP to build more tanks, with some restrictions of course). This applies to for-profit entities as well as non-profits (and has been a very profitable arrangement for many for-profits in the past). The federal gov't can direct the organization to release the IP under some specific arrangement (like an open license) or it can just leave the disposition of IP to the entity's discretion. Often, inaction results in the latter course.

The violation of a non-commercial license is subject to much debate and confusion which is why many IP policy organizations (and myself) would prefer if organizations did not use non-commercial licenses. Generally, commercial use is considered to be where an organization or individual makes money from the use. Sometimes this includes revenue generated by non-profit organizations, as well as individuals and corporations. You can read a study that Creative Commons did here: http://wiki.creativecommons.org/Defining_Noncommercial. It raises all kinds of painful questions like "If a small-time blogger puts up a non-commercial use photo on a website that has AdWords income, is that commercial use?" What if it's a "big-time" blogger?

The question of non-commercial use of the metadata is even trickier, as you point out, which is why I and others working the Learning Registry REALLY WANT TO DISCOURAGE everyone from publishing metadata with non-commercial use restrictions. We think that the restrictions you place on metadata should be same that you place on Google when it crawls your site for its (decidedly commercial) search engine index. Generally organizations place no restrictions on the use of the metadata that Google generates from crawling a website.

My recommendation is that you assign rights to your metadata with the same permissions as your website's Terms of Service (or more permissively). If your website's ToS prohibits commercial use, you might reconsider your ToS b/c you are basically saying that Google is violating your ToS and you are OK with that but you don't want anyone else to.

I'd personally prefer that everyone just use the LR (very) permissive license for metadata: http://old.learningregistry.org/information-assurances/ (that's a temporary URL, will be at www shortly). But I realize that not everyone is comfortable sharing metadata this liberally.

Ultimately the IP license that organizations choose is up to them, and the Learning Registry is a free distribution platform for metadata. But for consumers of metadata it is important to look at the TOS field in the LR envelope which is where the metadata license information is stored for any given piece of metadata. 

Steve



Diny Golder

unread,
Jun 27, 2013, 4:18:42 PM6/27/13
to <learning-registry-collaborate@googlegroups.com>, learning-regis...@googlegroups.com, learningregistry
Greetings, I don't recall discussions about "restricting the use" of NSDL metadata? Now there may be resources that are copyrighted here and there and the creators has the right to do what they want under NSF funding (commercial development is encouraged in NSF projects) but the NSDL  program was designed to share metadata and there  are commercial education products out there that include NSDL metadata in the offering, which is not restricted. If the creator decides to charge for the resource, I believe they can. NSDL projects were encouraged to become self sustainable, hence charging for whatever came out of the grant. I know that we can charge for ASN data, some of which was created with NSF funds. Charging or not charging is our decision. But, whether we charge or don't charge, commercial or non-commercial organizations can use ASN data. The licensing is up to us and we expect users of ASN data to adhere to our licensing requirements.

Many thanks, Diny

Diny Golder ~ Sent from my iPad
To unsubscribe:learning-registry-collabo...@googlegroups.com

Steve Midgley

unread,
Jun 27, 2013, 6:23:54 PM6/27/13
to <learning-registry-collaborate@googlegroups.com>, learningregistry
I don't think NSDL has announced anything about how they license their metadata. I believe they raised this issue in the public forum to indicate they are trying to figure out how to share their metadata. Should they use a non-commercial metadata license? Should they restrict access to their metadata to only certain partners? They're trying to decide what's next I think..

In my opinion, orgs can get a lot of heat for releasing metadata into the open under restrictive licenses, but orgs that don't release their metadata *at all* in the open don't get talked about as much. I'd encourage us all to have an apples-to-apples comparison when evaluating the decisions that all orgs have to make. There are totally legitimate business reasons to restrict access to metadata, but releasing metadata under restrictive terms should get more credit in terms of "openness" than not releasing it at all.

And of course, Diny and ASN is highly permissive with their metadata (yay!) so credit to Jes and co for making their IP available to everyone to improve the open infrastructure in education.

Steve



Joshua Marks

unread,
Jun 27, 2013, 7:36:35 PM6/27/13
to learning-regis...@googlegroups.com, learningregistry

Steve,

 

So it seems there is technical deficiency in that the LR content record must contain an additional listing of use rights for the metadata in addition to the use rights for the content it describes. This would support your described approach of enabling release of metadata with additional or variable restrictions. Nodes or indexes used by commercial vendors would need to filter out all non-commercial metadata records (even if they point to more liberally licensed assets).  

 

Now this brings-up a related topic about multiple GUIDS for the same asset that I have been wondering about. Let’s say Khan publishes all his videos’ metadata via LR. Let’s also say that in Curriki, OER commons, Engage NY, The Illinois repository each have users that create content records for the same asset as a link to the same YouTube URL/ID are variably described and also published to the LR. So now there are N different content records with different metadata and GUIDs about the same asset… Hmmmmm. Not as useful as we would like.

 

The issue here is a fundamental misconception of what Open means. The publisher of content records should be the official copyright holders hosting location for the actual source for the asset. It is problematic to have systems that are not content repositories, but rather are acting as referatories (links + metadata), generate GUIDS for content.  But if the actually source location is not a LR publishing node, what is to be done?

Steve Midgley

unread,
Jun 27, 2013, 11:13:09 PM6/27/13
to <learning-registry-collaborate@googlegroups.com>, learningregistry
Hey Joshua,

I think you are raising two (and a half) distinct issues..

1) Metadata has a license and content has a license - they are not always the same.
1A) Can non-commercial metadata be distributed over commercial LR nodes?
2) Resources on the internet/web can have multiple URL/URI identifiers.

Let me try my best on these, one at a time.. 

#1 - Metadata rights vs content rights. I don't think LR has a technical deficiency regarding metadata licensing. The LR envelope has a field called "TOS" - publishers provide a URL to the Terms Of Service that describes how others may use their metadata that has been published into the LR network in this field. This is required - but the TOS can be anything at all. If it is invalid or incomprehensible, then we recommend orgs do not consume/use those envelopes. This hasn't been a problem to date.

Optionally, if a publisher wishes to *also* indicate a license for their content in LR, this is easily done via whatever metadata schema is provided in the LR payload that describes the content. For LRMI, provide the field "useRightsURL" (cf. http://www.lrmi.net/the-specification) pointing to the rights to use the content itself (e.g. http://creativecommons.org/licenses/by/3.0/). For Dublin core use the "rights" field (http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#rights). 

#1A - Non-commercial metadata distribution. I think you also are asking if commercial vendors who operate LR nodes must avoid non-commercial metadata on their nodes? I believe the answer to this is "no." Nodes are "dumb pipes" and so under DMCA (in my non-lawyerly but informed opinion) are classified as infrastructure. If a commercial operator were to *use* the metadata, then they would need to be sure they are not using non-commercial metadata (obviously). In fact, as I understand DMCA, if a commercial operator were to set up filtering by metadata then that node *might* not be classed as a "dumb pipe," which could *increase* the potential liability. Providing a full-feed of metadata on a node means the operator is not party to the metadata, just a "by-standing" infrastructure operator transporting it from A to B. LR was specifically designed to take advantage of this approach. I can't provide legal guarantees (no one can) but I talked about this in detail with several law professors who are experts in DMCA, copyright and the internet, so I think this issue is OK.

Did I address the deficiency you were describing - or can you clarify?

#2 - Multiple URLs that point to the same resource. This is a general problem, and not intrinsic to LR. LR does provide an opportunity to share what is known about duplicate URL IDs. Think of it as a federated HTTP 300 or something like that. W3C has a proposal for Schema.org to allow these statements: http://www.w3.org/wiki/WebSchemas/sameAs. My opinion is that's a pretty good model that we should just use. One remaining problem is processing the "sameAs" statements - but LR is a metadata delivery tool, not a processing tool - I think a few people are working with Sparql (ASN) or Neo4j (InBloom/LR search) and those graph databases would be the best place to process sameAs information coming from LR. But LR allows us to tell each other what we know about sameAs URLs that might crop up - so we don't all have to figure it out on our own!

That said, I think there's a thornier problem which is that in addition to "sameAs" URLs around the web, there are also "prettyMuchTheSameAs" URLs all over the web. They resolve to different websites, with different tool bars, logins, etc, but from a user's perspective each resource is "pretty much the same" as the other resources. If you embed a Khan Academy video on NSDL and CA Dept of Ed's teacher portal, the user will see the same video and would probably not really think of the two pages as significantly different. Right now the web doesn't have a mechanism to talk about pages that are pretty much the same, but not exactly the same. I think it would be great if we could start sharing prettyMuchTheSameAs information with each other through Learning Registry (and elsewhere) b/c it could greatly reduce the number of times a user gets presented with two resources on a search results page, but that for all the user cares, the two results are equivalent.. That's a bit down the road, but hopefully it makes sense what I'm getting at?

What do you think? Am I talking about the issues you're raising? 

Best,
Steve







Joshua Marks

unread,
Jul 13, 2013, 3:46:48 PM7/13/13
to learning-regis...@googlegroups.com, learningregistry

Steve,

 

First off, my apologies for taking so long to respond to this rather challenging thread. Both other priorities and the nuances involved have given me pause to think. I will first answer the technical items then touch on the related legal and business complexities that fall from them.

 

Regarding #1- Well having a TOS reference gives a human the ability to inspect the license or rights and determine the ability or lack of ability to use in a commercial context (However that is defined by the publisher of the metadata). A system (LR node filter, search index, LMS) however, cannot.

 

Regarding 1A- I guess you are suggesting under DMCA this is an “Out” for liability because someone chose to publish the metadata and as long as the commercial system using it claims ignorance and an inability to filter, then there is no liability. In essence I think you are suggesting an implied license to the end user of any system (Commercial or not) by virtue of a publisher posting the records to the Learning Registry. This I fear is the realm of lawyers, which I am not. How one defines “Use” of the metadata is really up to the publisher as far as I can see, and they might consider indexing the metadata not acceptable. In the case of Google, this is why they check a robots.txt file before indexing a domain. They have come to the position that Google will index you unless you use a structured method to tell them not to.

 

Regarding #3- Agreed, this multiplicity is intrinsic, however, it is also not helpful for al the reasons you indicate and more. This is particularly so when you are talking about frameworks and identifies. The more synonyms and mappings one needs to track the slower things get and the more data errors and noise increases. Also as with source code, too much forking and versioning of content makes merging changes harder or impossible.

 

So yes you are talking to the issues I am raising. But there is another related issue that you seem to be skirting. Publishers who make their metadata, content or services only available for non-commercial use do so in large part because their business and sustainability depend on being able to sell that metadata, content or service to commercial vendors who can create commercial value with it. Alternately, they use it to drive traffic and use of their complete solutions and services and generate revenue that way (Directly or indirectly). Anyone who does this is highly disincentivized to provide this metadata to the Learning Registry as it undercuts their sustainability… unless they can be assured that commercial actors will not use it in ways they to not want. So we have a legal stalemate limiting the ability for non-government actors who value their metadata from publishing it to the Learning Registry.

 

IMHO, the Learning Registry metadata record should have the equivalent of Robots.txt in it, and allow the publisher to explicitly say how the metadata can be used by any system, and particularly search engines, commercial or non-commercial. Without such rights (As well as source and format/content policy) filtering capability at the node level, there is no reason to have more than one common node and that node would effectively become the single open repository for all fully open published metadata and paradata (And no private or even non-commercial use metadata).

 

There are no answers here, just question. Sorry about that.

 

Joshua Marks

Diny Golder

unread,
Jul 13, 2013, 4:55:27 PM7/13/13
to <learning-registry-collaborate@googlegroups.com>, learningregistry
Goodness, has Stuart (Sutton) put in his 2 cents yet? I think he is considered an expert in this area as an intellectual property lawyer. And he teaches in the areas of information law and policy, legal informatics, and the organization of information. 

I think this is right up his alley!

Many thanks, Diny

Diny Golder ~ Sent from my iPad

Jim Klo

unread,
Jul 14, 2013, 4:45:33 AM7/14/13
to <learning-registry-collaborate@googlegroups.com>, learning-regis...@googlegroups.com, learningregistry
See inline comments below.

On Jul 13, 2013, at 12:46 PM, "Joshua Marks" <jma...@curriki.org> wrote:

Steve,

 

First off, my apologies for taking so long to respond to this rather challenging thread. Both other priorities and the nuances involved have given me pause to think. I will first answer the technical items then touch on the related legal and business complexities that fall from them.

 

Regarding #1- Well having a TOS reference gives a human the ability to inspect the license or rights and determine the ability or lack of ability to use in a commercial context (However that is defined by the publisher of the metadata). A system (LR node filter, search index, LMS) however, cannot.

 


Seems like a bit of a red herring. Why a system could not whitelist/blacklist in the same manner a human can is not necessarily 100% true. Discoverability of new TOS requires human interpretation first, but once discovered, any mechanical system can be instructed how to honor such TOS.



Regarding 1A- I guess you are suggesting under DMCA this is an “Out” for liability because someone chose to publish the metadata and as long as the commercial system using it claims ignorance and an inability to filter, then there is no liability. In essence I think you are suggesting an implied license to the end user of any system (Commercial or not) by virtue of a publisher posting the records to the Learning Registry. This I fear is the realm of lawyers, which I am not. How one defines “Use” of the metadata is really up to the publisher as far as I can see, and they might consider indexing the metadata not acceptable. In the case of Google, this is why they check a robots.txt file before indexing a domain. They have come to the position that Google will index you unless you use a structured method to tell them not to.

 


I'm not sure a robots.txt has any legal binding. There's been lot's of discussion over the years - and other than a being a "friendly convention" many choose to adhere - I'm not aware of anyone who has gotten in trouble because they violated the robots.txt. Again - I'm no lawyer, but IMO robots.txt is a weak defense, especially when blacklisting visitors you don't like is trivial to do. It's like leaving an open suitcase full of money on a sidewalk by itself, a secured camera in plain view recording passersby, and a sign on the suitcase that says "don't touch". 

I make the assumption that metadata falls into the same realm of copyright. You as the copyright holder have the burden to uphold your copyright, less you loose your right. Hence unless you copyright the metadata, do things to prevent infringement (enter binding contracts) including pursuing those that infringe - the law typically doesn't uphold your copyright for you. This is why RIAA/MPAA tries to go after everyone. 



Regarding #3- Agreed, this multiplicity is intrinsic, however, it is also not helpful for al the reasons you indicate and more. This is particularly so when you are talking about frameworks and identifies. The more synonyms and mappings one needs to track the slower things get and the more data errors and noise increases. Also as with source code, too much forking and versioning of content makes merging changes harder or impossible.

 

So yes you are talking to the issues I am raising. But there is another related issue that you seem to be skirting. Publishers who make their metadata, content or services only available for non-commercial use do so in large part because their business and sustainability depend on being able to sell that metadata, content or service to commercial vendors who can create commercial value with it. Alternately, they use it to drive traffic and use of their complete solutions and services and generate revenue that way (Directly or indirectly). Anyone who does this is highly disincentivized to provide this metadata to the Learning Registry as it undercuts their sustainability… unless they can be assured that commercial actors will not use it in ways they to not want. So we have a legal stalemate limiting the ability for non-government actors who value their metadata from publishing it to the Learning Registry.

 


I have to disagree. If you are solely in the market of selling metadata that's publicly accessible - IMHO you've got a business model that's on life support. I don't want to get into a deep criticism of this business model, however I'm not saying that its not being done, there have been good reasons for it to be done in past. But given the recent trends in social curation (Learni.st, Pinterest, tumblr, yelp, foursquare, stackexchange and others) where metadata (note I'm not referring to paradata) is just getting recreated and redistributed all over the place, the actual value of commercial metadata will at some point become more or less nil, but the interface and user experience of interacting with the data is the value. IE, the metadata about resources on referatories like Curriki, OER Commons, Brokers of Expertise, Gooru, and others at some point begins to normalize and commoditize - you can only describe the same resource so many ways before it starts looking the same to the end user regardless of minor differences (see Steve's roughly almost the same as argument). The marketable differentiator then is the experience each uniquely provides in accessing or discovering resources. It becomes the same question as why use Bing over Google? LR does not seek to provide the novel experience for accessing and experiencing the data. LR provides basic distribution of commodity metadata for learning resources.

Paradata is another story. I can agree that there is some market value to it - however at a government level - I don't believe there is a way a state can capitalize on it. If you have a state operated portal, it's not clear to me that they can monetize the usage data, favorites, ratings, comments, etc - hence as long a sharing contains no PII and violates some law/policy, there seems to be no apparent negative fiscal impact (other than infrastructure and data transmission costs). A for-profit business, on the other hand, might have problems with sharing this kind of data if it is directly tied to your bottom line, especially if you have a novel way of curating your paradata to improve your customer experience.

IMHO, the Learning Registry metadata record should have the equivalent of Robots.txt in it, and allow the publisher to explicitly say how the metadata can be used by any system, and particularly search engines, commercial or non-commercial. Without such rights (As well as source and format/content policy) filtering capability at the node level, there is no reason to have more than one common node and that node would effectively become the single open repository for all fully open published metadata and paradata (And no private or even non-commercial use metadata).

 


The TOS property is that and can be used for such purposes. I'm just not sure using it exactly like a robots file makes sense, why not just use a metadata format that supports describing granular use rights or define it externally like a sitemap.xml? 

I would advocate people to setup specialty nodes or services that whitelist content by TOS - I think it could be a great value add; potentially a model for building a node with paid access to curated filters of the metadata accordingly

Cheers,

- JK

Steve Midgley

unread,
Jul 14, 2013, 10:42:43 PM7/14/13
to learningregistry, <learning-registry-collaborate@googlegroups.com>
I think Jim provided a response more complete than what I offered - I agree with his argument. I'll add that the metadata can be licensed precisely the way content on any website is licensed (except in cases where metadata are *facts* in which case you can't license it.) And it faces the same legal issues: If you license your website "CC-By Non-commercial 3.0" how do you know that a commercial entity hasn't crawled your site using a Google crawler look-alike technology? You don't - you *trust* that businesses won't do that b/c if they get caught they would be breaking the law. Same for Learning Registry - if you don't want a commercial entity using your metadata, license it non-commercial. If your search results turn up in third party sites, send them take down notices. If anyone is profiting enough to be worth your time (meaning someone you'd want to sell your metadata too in the first place), I think you (and the community at large) will notice the abuse.

Regarding a white-list node based on ToS - Greg G was setting one up at CC, but he left for Wikimedia. I know CC would be interested in hosting this, if orgs like Curriki want to contribute?

The open public web presents all kinds of opportunities for abuse of content, and I'd argue that Learning Registry presents the same kinds of problems for metadata. But LR helps distribute and share metadata openly which opens up all kinds of interesting new solutions and products too (both non-commercial and commercial).

I'll be be curious what Joshua and others think! This is an important discussion to hash out, I think.

Steve

Lisa Petrides

unread,
Jul 16, 2013, 10:25:30 AM7/16/13
to learning...@googlegroups.com, <learning-registry-collaborate@googlegroups.com>
Hi Steve, et al.,

I�ve been reading with interest this current discussion about metadata and the Learning Registry (LR), though I must admit I do feel as if I�m on a merry-go-round that won�t stop when I think about the dozens if not hundreds of conversations about the use of OER metadata that I have been part of over the past ten years�do we need it, how should it be licensed, who owns it, etc. Full disclosure, I�m a metadata geek, and believe that metadata itself becomes a resource once it is enhanced, refined, rated, reviewed, aligned, and associated with user-generated content and use patterns. Also, as someone who happily built a career on studying and implementing ways for educators to develop a practice of continuous learning based on transparency of data and information, getting involved in the OER movement was the perfect storm for increasing access to education.


As president of ISKME, which produced the first open education resource library, OER Commons, I�ve watched the sentiment on metadata go in and out like the tides. At first, OER metadata was coveted as the secret sauce necessary to keep resources searchable and discoverable. Back in 2004, with a grant from the Hewlett Foundation, our research team spent about nine months analyzing the metadata of various content creators, such as MITE, Connexions, Sofia, and Utah State University, to create a map of what was common, unique, etc. Then we looked across several content standards bodies (Dublin Core, IEEE, etc.) to map what was common, unique, and cutting edge. From that, the structure of OER Commons was born.


Over the years, we have actively curated content, shared resources, and nurtured partnerships that would enable us to build a commons for all educators and learners. This required countless hours of refinement, enhancement, and providing technical tutoring for terrific content creators who just wanted their work to be seen and shared by others. We realized that if OER were to be widely used, it would need to be described well, and as such, we had to make an investment in resource description.


Next, came the Naysayer Period of metadata, when some argued that controlled vocabularies and rich descriptions would be rendered meaningless as new AI and machine learning techniques moved to center stage. From our perspective, however, working on the ground with teachers for the past decade, we keenly understood how teachers used �terms� to find resources, so we struggled to keep the conversation going about the importance of metadata. Our goal was to keep a vibrant ecosystem of teachers and learners engaged in this eccentric thing called OER.


Then, metadata became important again, but this time, on the sly. For example, Google convened daylong meetings to understand what we did with OER metadata, and they realized, �Wow, that�s a lot of work.�


And we�d respond, �Ask any librarian, of course it�s work to curate meaningful, high-quality resources.�


In other cases, various people and organizations asked to use our metadata to experiment building their own tools and services. We felt this was kind of cool, because the field was still in its infancy, and experiments with recommender systems and the like were just beginning. Then came other repository-builders who used our metadata to duplicate what we�d done, and even though we would have liked to get some credit (this was frequently done �anonymously�), it was exciting to witness this uptake in OER occurring.


It was around this time, a few years back, that the conversations about the licensing of OER metadata began to take place. While Creative Commons� legal team and others argued that one could not license metadata period (which was somewhat true depending on how you define metadata, and mostly not true in countries outside the U.S.), sustainable business models in OER began to emerge.


Then, fast forward to a more robust OER environment, where people are actually using and reusing the stuff (!) and enter terms like paradata, descriptive data, resource data, etc., and it all starts to get a bit confusing. Why do I say this? Let�s face it; there are more than a few elephants in the room.


Organizations like ours (and we are not alone) are having second thoughts about how much metadata they share and with whom. Some are becoming skittish, because the tides have turned once again and quality metadata is in high demand. We decided at OER Commons, for example, to place an �all rights reserved� notice on our website (meaning, you can�t scrape metadata and reuse it), and a license for non-commercial use on our metadata, with the goal of working with partners who desire something more than faux collaboration.


You should know that we do share a portion of our metadata in the Learning Registry, that which is related to the Common Core. And we will continue to do so if we can license it as non-commercial. When Steve Midgley first came to us asking us to participate in the LR, we heartily agreed, because for the past ten years we have enjoyed putting ourselves out there in OER experimental pools, just for fun and to see what the real potential of this movement could be. We want to be part of that. LRMI, schema.org, sure, let�s try it!


However, at the same time, we have begun to see a wave of open source pillaging. The power of open source software is the ability it gives people to build on code to create something better, different, or new. But so far, about 90 percent of the re-use of OER metadata I have seen in action (not in theory) is about commercial publishers looking to resell it, disguised as a service (If you don�t believe me, look at our non-commercial resources inappropriately used here: http://oer.equella.com/access/home.do.). That isn�t the spirit of OER as some of us intended it. To me, OER is about access to education for all, in the public domain, forever, for free. It�s not just an enticement to have something free, and then later be seduced to come back and buy the more high quality �education.� After all, isn�t that what we already have in place in US education--where those who can pay will have better access?


As for the Learning Registry, this has been an innovative effort led by Steve and others to create an exchange of metadata for the good of education. In 2011, the LR website stated, �The Learning Registry makes federal learning resources easier to find, easier to access and easier to integrate into learning environments wherever they are stored -- around the country and the world.�


Then in 2012, the site states, �
The Learning Registry is an open source technical system designed to facilitate the exchange of data behind the scenes, and an open community of resource creators, publishers, curators, and consumers who are collaborating to broadly share resources.�


Today it states, �
The goal of the Learning Registry is to help you access high-quality digital resources for use with your students.� The concept now reads much more like a front-end serving educators directly. This mission drift/shift, which is what most good organizations do as they evolve, seems very likely to me to end up serving the commercial sector first and foremost. Do I applaud the efforts? Yes! Keep innovating everyone. Let�s just call it like it is.


In other words, I�m not at all opposed to commercial for-profit efforts, but let�s just not pretend that these are something else. What we are seeing more frequently these days is an OER storefront, supporting a freemium model with something that isn�t even theirs. It�s as if Barnes and Noble were to invite the public library to set up a display in the front of the store, so when you first walk in you see all these cool, highly curated books, serving the public good. But then when you step past the facade, you see it�s just provided as an entryway to the commercial store �akin to using OER as a marketing mechanism for a future sale.


What motivates our decision to share our metadata with a non-commercial license is our commitment to provide education as a public good, while maintaining the ability to create our own sustainability models to do so.

�

Best,
Lisa Petrides

Sunday, July 14, 2013 7:42 PM
I think Jim provided a response more complete than what I offered - I agree with his argument. I'll add that the metadata can be licensed precisely the way content on any website is licensed (except in cases where metadata are *facts* in which case you can't license it.) And it faces the same legal issues: If you license your website "CC-By Non-commercial 3.0" how do you know that a commercial entity hasn't crawled your site using a Google crawler look-alike technology? You don't - you *trust* that businesses won't do that b/c if they get caught they would be breaking the law. Same for Learning Registry - if you don't want a commercial entity using your metadata, license it non-commercial. If your search results turn up in third party sites, send them take down notices. If anyone is profiting enough to be worth your time (meaning someone you'd want to sell your metadata too in the first place), I think you (and the community at large) will notice the abuse.

Regarding a white-list node based on ToS - Greg G was setting one up at CC, but he left for Wikimedia. I know CC would be interested in hosting this, if orgs like Curriki want to contribute?

The open public web presents all kinds of opportunities for abuse of content, and I'd argue that Learning Registry presents the same kinds of problems for metadata. But LR helps distribute and share metadata openly which opens up all kinds of interesting new solutions and products too (both non-commercial and commercial).

I'll be be curious what Joshua and others think! This is an important discussion to hash out, I think.

Steve




--
--
---
This message is posted from the Google Groups "LearningRegistry" group. More information about the Learning Registry project can be found at http://learningregistry.org/
�
To post: learning...@googlegroups.com
To unsubscribe: learningregist...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/learningregistry?hl=en?hl=en
�
---
You received this message because you are subscribed to the Google Groups "Learning Registry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningregist...@googlegroups.com.
�
�
Sunday, July 14, 2013 1:45 AM
See inline comments below.

On Jul 13, 2013, at 12:46 PM, "Joshua Marks" <jma...@curriki.org> wrote:

Steve,

�

First off, my apologies for taking so long to respond to this rather challenging thread. Both other priorities and the nuances involved have given me pause to think. I will first answer the technical items then touch on the related legal and business complexities that fall from them.

�

Regarding #1- Well having a TOS reference gives a human the ability to inspect the license or rights and determine the ability or lack of ability to use in a commercial context (However that is defined by the publisher of the metadata). A system (LR node filter, search index, LMS) however, cannot.

�


Seems like a bit of a red herring. Why a system could not whitelist/blacklist in the same manner a human can is not necessarily 100% true. Discoverability of new TOS requires human interpretation first, but once discovered, any mechanical system can be instructed how to honor such TOS.



Regarding 1A- I guess you are suggesting under DMCA this is an �Out� for liability because someone chose to publish the metadata and as long as the commercial system using it claims ignorance and an inability to filter, then there is no liability. In essence I think you are suggesting an implied license to the end user of any system (Commercial or not) by virtue of a publisher posting the records to the Learning Registry. This I fear is the realm of lawyers, which I am not. How one defines �Use� of the metadata is really up to the publisher as far as I can see, and they might consider indexing the metadata not acceptable. In the case of Google, this is why they check a robots.txt file before indexing a domain. They have come to the position that Google will index you unless you use a structured method to tell them not to.

�


I'm not sure a robots.txt has any legal binding. There's been lot's of discussion over the years - and other than a being a "friendly convention" many choose to adhere - I'm not aware of anyone who has gotten in trouble because they violated the robots.txt. Again - I'm no lawyer, but IMO robots.txt is a weak defense, especially when blacklisting visitors you don't like is trivial to do. It's like leaving an open suitcase full of money on a sidewalk by itself, a secured camera in plain view recording passersby, and a sign on the suitcase that says "don't touch".�

I make the assumption that metadata falls into the same realm of copyright. You as the copyright holder have the burden to uphold your copyright, less you loose your right. Hence unless you copyright the metadata, do things to prevent infringement (enter binding contracts) including pursuing those that infringe - the law typically doesn't uphold your copyright for you. This is why RIAA/MPAA tries to go after everyone.�



Regarding #3- Agreed, this multiplicity is intrinsic, however, it is also not helpful for al the reasons you indicate and more. This is particularly so when you are talking about frameworks and identifies. The more synonyms and mappings one needs to track the slower things get and the more data errors and noise increases. Also as with source code, too much forking and versioning of content makes merging changes harder or impossible.

�

So yes you are talking to the issues I am raising. But there is another related issue that you seem to be skirting. Publishers who make their metadata, content or services only available for non-commercial use do so in large part because their business and sustainability depend on being able to sell that metadata, content or service to commercial vendors who can create commercial value with it. Alternately, they use it to drive traffic and use of their complete solutions and services and generate revenue that way (Directly or indirectly). Anyone who does this is highly disincentivized to provide this metadata to the Learning Registry as it undercuts their sustainability� unless they can be assured that commercial actors will not use it in ways they to not want. So we have a legal stalemate limiting the ability for non-government actors who value their metadata from publishing it to the Learning Registry.

�


I have to disagree. If you are solely in the market of selling metadata that's publicly accessible - IMHO you've got a business model that's on life support. I don't want to get into a deep criticism of this business model, however I'm not saying that its not being done, there have been good reasons for it to be done in past. But given the recent trends in social curation (Learni.st, Pinterest, tumblr, yelp, foursquare, stackexchange and others) where metadata (note I'm not referring to paradata) is just getting recreated and redistributed all over the place, the actual value of commercial metadata will at some point become more or less nil, but the interface and user experience of interacting with the data is the value. IE, the metadata about resources on referatories like Curriki, OER Commons, Brokers of Expertise, Gooru, and others at some point begins to normalize and commoditize - you can only describe the same resource so many ways before it starts looking the same to the end user regardless of minor differences (see Steve's roughly almost the same as argument). The marketable differentiator then is the experience each uniquely provides in accessing or discovering resources. It becomes the same question as why use Bing over Google? LR does not seek to provide the novel experience for accessing and experiencing the data. LR provides basic distribution of commodity metadata for learning resources.

Paradata is another story. I can agree that there is some market value to it - however at a government level - I don't believe there is a way a state can capitalize on it. If you have a state operated portal, it's not clear to me that they can monetize the usage data, favorites, ratings, comments, etc - hence as long a sharing contains no PII and violates some law/policy, there seems to be no apparent negative fiscal impact (other than infrastructure and data transmission costs). A for-profit business, on the other hand, might have problems with sharing this kind of data if it is directly tied to your bottom line, especially if you have a novel way of curating your paradata to improve your customer experience.

IMHO, the Learning Registry metadata record should have the equivalent of Robots.txt in it, and allow the publisher to explicitly say how the metadata can be used by any system, and particularly search engines, commercial or non-commercial. Without such rights (As well as source and format/content policy) filtering capability at the node level, there is no reason to have more than one common node and that node would effectively become the single open repository for all fully open published metadata and paradata (And no private or even non-commercial use metadata).

�


The TOS property is that and can be used for such purposes. I'm just not sure using it exactly like a robots file makes sense, why not just use a metadata format that supports describing granular use rights or define it externally like a sitemap.xml?�

I would advocate people to setup specialty nodes or services that whitelist content by TOS - I think it could be a great value add; potentially a model for building a node with paid access to curated filters of the metadata accordingly

Cheers,

- JK

There are no answers here, just question. Sorry about that.

�

Joshua Marks

�

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Thursday, June 27, 2013 8:13 PM
To: <learning-regis...@googlegroups.com>
Cc: learningregistry
Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

Hey Joshua,

�

I think you are raising two (and a half) distinct issues..

�

1) Metadata has a license and content has a license - they are not always the same.

1A) Can non-commercial metadata be distributed over commercial LR nodes?

2) Resources on the internet/web can have multiple URL/URI identifiers.

�

Let me try my best on these, one at a time..�

�

#1 - Metadata rights vs content rights. I don't think LR has a technical deficiency regarding metadata licensing. The LR envelope has a field called "TOS" - publishers provide a URL to the Terms Of Service that describes how others may use their metadata that has been published into the LR network in this field. This is required - but the TOS can be anything at all. If it is invalid or incomprehensible, then we recommend orgs do not consume/use those envelopes. This hasn't been a problem to date.

�

Optionally, if a publisher wishes to *also* indicate a license for their content in LR, this is easily done via whatever metadata schema is provided in the LR payload that describes the content. For LRMI, provide the field "useRightsURL" (cf.�http://www.lrmi.net/the-specification) pointing to the rights to use the content itself (e.g.�http://creativecommons.org/licenses/by/3.0/). For Dublin core use the "rights" field (http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#rights).�

�

#1A - Non-commercial metadata distribution. I think you also are asking if commercial vendors who operate LR nodes must avoid non-commercial metadata on their nodes? I believe the answer to this is "no." Nodes are "dumb pipes" and so under DMCA (in my non-lawyerly but informed opinion) are classified as infrastructure. If a commercial operator were to *use* the metadata, then they would need to be sure they are not using non-commercial metadata (obviously). In fact, as I understand DMCA, if a commercial operator were to set up filtering by metadata then that node *might* not be classed as a "dumb pipe," which could *increase* the potential liability. Providing a full-feed of metadata on a node means the operator is not party to the metadata, just a "by-standing" infrastructure operator transporting it from A to B. LR was specifically designed to take advantage of this approach. I can't provide legal guarantees (no one can) but I talked about this in detail with several law professors who are experts in DMCA, copyright and the internet, so I think this issue is OK.

�

Did I address the deficiency you were describing - or can you clarify?

�

#2 - Multiple URLs that point to the same resource. This is a general problem, and not intrinsic to LR. LR does provide an opportunity to share what is known about duplicate URL IDs. Think of it as a federated HTTP 300 or something like that. W3C has a proposal for Schema.org to allow these statements:�http://www.w3.org/wiki/WebSchemas/sameAs. My opinion is that's a pretty good model that we should just use. One remaining problem is processing the "sameAs" statements - but LR is a metadata delivery tool, not a processing tool - I think a few people are working with Sparql (ASN) or Neo4j (InBloom/LR search) and those graph databases would be the best place to process sameAs information coming from LR. But LR allows us to tell each other what we know about sameAs URLs that might crop up - so we don't all have to figure it out on our own!

�

That said, I think there's a thornier problem which is that in addition to "sameAs" URLs around the web, there are also "prettyMuchTheSameAs" URLs all over the web. They resolve to different websites, with different tool bars, logins, etc, but from a user's perspective each resource is "pretty much the same" as the other resources. If you embed a Khan Academy video on NSDL and CA Dept of Ed's teacher portal, the user will see the same video and would probably not really think of the two pages as significantly different. Right now the web doesn't have a mechanism to talk about pages that are pretty much the same, but not exactly the same. I think it would be great if we could start sharing prettyMuchTheSameAs information with each other through Learning Registry (and elsewhere) b/c it could greatly reduce the number of times a user gets presented with two resources on a search results page, but that for all the user cares, the two results are equivalent.. That's a bit down the road, but hopefully it makes sense what I'm getting at?

�

What do you think? Am I talking about the issues you're raising?�

�

Best,

Steve

�

�

�

�

�

�

�

�

On Thu, Jun 27, 2013 at 7:36 PM, Joshua Marks <jma...@curriki.org> wrote:

Steve,

�

So it seems there is technical deficiency in that the LR content record must contain an additional listing of use rights for the metadata in addition to the use rights for the content it describes. This would support your described approach of enabling release of metadata with additional or variable restrictions. Nodes or indexes used by commercial vendors would need to filter out all non-commercial metadata records (even if they point to more liberally licensed assets). �

�

Now this brings-up a related topic about multiple GUIDS for the same asset that I have been wondering about. Let�s say Khan publishes all his videos� metadata via LR. Let�s also say that in Curriki, OER commons, Engage NY, The Illinois repository each have users that create content records for the same asset as a link to the same YouTube URL/ID are variably described and also published to the LR. So now there are N different content records with different metadata and GUIDs about the same asset� Hmmmmm. Not as useful as we would like.

�

The issue here is a fundamental misconception of what Open means. The publisher of content records should be the official copyright holders hosting location for the actual source for the asset. It is problematic to have systems that are not content repositories, but rather are acting as referatories (links + metadata), generate GUIDS for content. �But if the actually source location is not a LR publishing node, what is to be done?

�

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

US 831-685-3511

�

I welcome you to�become a member�of the Curriki community, to follow us on�Twitter�and to say hello on our�blog,�Facebook�and�LinkedIn�communities.

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Thursday, June 27, 2013 3:24 PM
To: <learning-regis...@googlegroups.com>
Cc: learningregistry


Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

I don't think NSDL has announced anything about how they license their metadata. I believe they raised this issue in the public forum to indicate they are trying to figure out how to share their metadata. Should they use a non-commercial metadata license? Should they restrict access to their metadata to only certain partners? They're trying to decide what's next I think..

�

In my opinion, orgs can get a lot of heat for releasing metadata into the open under restrictive licenses, but orgs that don't release their metadata *at all* in the open don't get talked about as much. I'd encourage us all to have an apples-to-apples comparison when evaluating the decisions that all orgs have to make. There are totally legitimate business reasons to restrict access to metadata, but releasing metadata under restrictive terms should get more credit in terms of "openness" than not releasing it at all.

�

And of course, Diny and ASN is highly permissive with their metadata (yay!) so credit to Jes and co for making their IP available to everyone to improve the open infrastructure in education.

�

Steve

�

�

�

�

On Thu, Jun 27, 2013 at 4:18 PM, Diny Golder <di...@jesandco.org> wrote:

Greetings, I don't recall discussions about "restricting the use" of NSDL metadata? Now there may be resources that are copyrighted here and there and the creators has the right to do what they want under NSF funding (commercial development is encouraged in NSF projects) but the NSDL �program was designed to share metadata and there �are commercial education products out there that include NSDL metadata in the offering, which is not restricted. If the creator decides to charge for the resource, I believe they can. NSDL projects were encouraged to become self sustainable, hence charging for whatever came out of the grant. I know that we can charge for ASN data, some of which was created with NSF funds. Charging or not charging is our decision. But, whether we charge or don't charge, commercial or non-commercial organizations can use ASN data. The licensing is up to us and we expect users of ASN data to adhere to our licensing requirements.

Many thanks, Diny

�

Diny Golder ~ Sent from my iPad


On Jun 27, 2013, at 7:28 AM, "Kelly Peet" <ke...@academicbenchmarks.com> wrote:

My input on the topic starts with some foundational questions:

1) Without having looked very deeply, a quick glance at ndsl.org indicates funding by NSF.� A quick glance as nsf.gov indicates funding by an act of congress, ultimately (and probably naively) concluding the content was created using public funds.� First question, how can a publicly-funded organization hold copyright to its original works?� The answer would seem to precede any discussion about metadata (or meta-meta data) copyright considerations.

NOTE: my question is born out of ignorance more than any attempt to be inflammatory.

2) Presuming copyright does prevail, would it violate non-commercial use if a for-profit company were to develop a product in which such metadata were given away freely, but for which the free product were a companion to a product or integrated into a product that makes a profit?

3) I guess a way to restate question #2 might be to inquire about fair use as it might relate to, say, search engines harvesting the published metadata for use in algorithms to improve the relevance of results that might point to the NDSL original works.� The advertising revenues of such for-profit search engines certainly violates the non-commercial aspects, but under fair use, is foundational to today's largest internet properties.� I guess I don't understand what is attempting to be protected in the meta (meta-meta) data here.
kelly

�

On Sat, Jun 22, 2013 at 11:35 AM, Joshua Marks <jma...@curriki.org> wrote:

All,

�

This is really tricky. You are talking about the rights to the metadata rather than the content being described in the metadata. (Meta-meta data?) It seems problematic to restrict sharing and use of metadata in an open system like LR. Also there is the issue of fair use, where others can create the same or similar metadata under different license or terms. My suggestion is to not restrict the metadata to non-commercial ever, even if the content itself is. Otherwise a single non-commercial record in the LR would prevent any commercial use of the learning registry. �Presently there is no mechanism to know the rights or limitations to metadata in the LR (Other than the TOS, which may not apply to other node managed by others), and by virtue of publishing should be considered usable for commercial and non-commercial use.

�

The other approach is to say LR is only non-commercial use, which would preclude for profit entities from using or integrating with the LR as there would be no way to filter out the non-commercial metadata for potentially commercial compatible content.

�

The other approach is to add another element to the LR record for the license to the LR record itself so a commercial entity can filter and exclude any non-commercial metadata or paradata record from their node, index or consuming application.

�

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

US 831-685-3511

�

I welcome you to�become a member�of the Curriki community, to follow us on�Twitter�and to say hello on our�blog,�Facebook�and�LinkedIn�communities.

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Friday, June 21, 2013 11:54 PM
To: <learning-regis...@googlegroups.com>; learningregistry
Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

Looping in the main Learning Registry list onto this important question because many folks on that list might not yet be uploading to LR but will (hopefully) have opinions on this issue. See John's question below. Here's my summary and some additional questions that seem related:

�

Would a non-commercial metadata license prevent use of NSDL content?

Would a clear license / price for commercial use specified make it easier / better for commercial consumers of content?

How specific would the definition of non-commercial need to be to make qualified organizations comfortable they "fit" under the term?

�

Steve

�

On Fri, Jun 21, 2013 at 7:24 PM, John Weatherley <jwea...@ucar.edu> wrote:

Hi All,

We've been discussing here at NSDL about the Terms of Service that we attach to our metadata/data submissions to the public LR node. Currently, NSDL's terms are for non-commercial use only, similar to http://creativecommons.org/licenses/by-nc/3.0/

The ToS guidelines for the public LR indicate, however, that all submissions should be placed under one of four recognized ToS, all of which allow for commercial uses (see http://www.learningregistry.org/tos).

I'd be curious to hear whether anyone has made submissions that restrict to non-commercial use only? From a philosophical and practice point of view, would/does this prevent others from using NSDL metadata in a widespread manner?

-john


------------------------------------------------------
John Weatherley
National Science Digital Library (NSDL)

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.
�
�
�

---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
Saturday, July 13, 2013 12:46 PM

Steve,

�

First off, my apologies for taking so long to respond to this rather challenging thread. Both other priorities and the nuances involved have given me pause to think. I will first answer the technical items then touch on the related legal and business complexities that fall from them.

�

Regarding #1- Well having a TOS reference gives a human the ability to inspect the license or rights and determine the ability or lack of ability to use in a commercial context (However that is defined by the publisher of the metadata). A system (LR node filter, search index, LMS) however, cannot.

�

Regarding 1A- I guess you are suggesting under DMCA this is an �Out� for liability because someone chose to publish the metadata and as long as the commercial system using it claims ignorance and an inability to filter, then there is no liability. In essence I think you are suggesting an implied license to the end user of any system (Commercial or not) by virtue of a publisher posting the records to the Learning Registry. This I fear is the realm of lawyers, which I am not. How one defines �Use� of the metadata is really up to the publisher as far as I can see, and they might consider indexing the metadata not acceptable. In the case of Google, this is why they check a robots.txt file before indexing a domain. They have come to the position that Google will index you unless you use a structured method to tell them not to.

�

Regarding #3- Agreed, this multiplicity is intrinsic, however, it is also not helpful for al the reasons you indicate and more. This is particularly so when you are talking about frameworks and identifies. The more synonyms and mappings one needs to track the slower things get and the more data errors and noise increases. Also as with source code, too much forking and versioning of content makes merging changes harder or impossible.

�

So yes you are talking to the issues I am raising. But there is another related issue that you seem to be skirting. Publishers who make their metadata, content or services only available for non-commercial use do so in large part because their business and sustainability depend on being able to sell that metadata, content or service to commercial vendors who can create commercial value with it. Alternately, they use it to drive traffic and use of their complete solutions and services and generate revenue that way (Directly or indirectly). Anyone who does this is highly disincentivized to provide this metadata to the Learning Registry as it undercuts their sustainability� unless they can be assured that commercial actors will not use it in ways they to not want. So we have a legal stalemate limiting the ability for non-government actors who value their metadata from publishing it to the Learning Registry.

�

IMHO, the Learning Registry metadata record should have the equivalent of Robots.txt in it, and allow the publisher to explicitly say how the metadata can be used by any system, and particularly search engines, commercial or non-commercial. Without such rights (As well as source and format/content policy) filtering capability at the node level, there is no reason to have more than one common node and that node would effectively become the single open repository for all fully open published metadata and paradata (And no private or even non-commercial use metadata).

�

There are no answers here, just question. Sorry about that.

�

Joshua Marks

�

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Thursday, June 27, 2013 8:13 PM
To: <learning-regis...@googlegroups.com>
Cc: learningregistry
Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

Hey Joshua,

�

I think you are raising two (and a half) distinct issues..

�

1) Metadata has a license and content has a license - they are not always the same.

1A) Can non-commercial metadata be distributed over commercial LR nodes?

2) Resources on the internet/web can have multiple URL/URI identifiers.

�

Let me try my best on these, one at a time..�

�

#1 - Metadata rights vs content rights. I don't think LR has a technical deficiency regarding metadata licensing. The LR envelope has a field called "TOS" - publishers provide a URL to the Terms Of Service that describes how others may use their metadata that has been published into the LR network in this field. This is required - but the TOS can be anything at all. If it is invalid or incomprehensible, then we recommend orgs do not consume/use those envelopes. This hasn't been a problem to date.

�

Optionally, if a publisher wishes to *also* indicate a license for their content in LR, this is easily done via whatever metadata schema is provided in the LR payload that describes the content. For LRMI, provide the field "useRightsURL" (cf.�http://www.lrmi.net/the-specification) pointing to the rights to use the content itself (e.g.�http://creativecommons.org/licenses/by/3.0/). For Dublin core use the "rights" field (http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#rights).�

�

#1A - Non-commercial metadata distribution. I think you also are asking if commercial vendors who operate LR nodes must avoid non-commercial metadata on their nodes? I believe the answer to this is "no." Nodes are "dumb pipes" and so under DMCA (in my non-lawyerly but informed opinion) are classified as infrastructure. If a commercial operator were to *use* the metadata, then they would need to be sure they are not using non-commercial metadata (obviously). In fact, as I understand DMCA, if a commercial operator were to set up filtering by metadata then that node *might* not be classed as a "dumb pipe," which could *increase* the potential liability. Providing a full-feed of metadata on a node means the operator is not party to the metadata, just a "by-standing" infrastructure operator transporting it from A to B. LR was specifically designed to take advantage of this approach. I can't provide legal guarantees (no one can) but I talked about this in detail with several law professors who are experts in DMCA, copyright and the internet, so I think this issue is OK.

�

Did I address the deficiency you were describing - or can you clarify?

�

#2 - Multiple URLs that point to the same resource. This is a general problem, and not intrinsic to LR. LR does provide an opportunity to share what is known about duplicate URL IDs. Think of it as a federated HTTP 300 or something like that. W3C has a proposal for Schema.org to allow these statements:�http://www.w3.org/wiki/WebSchemas/sameAs. My opinion is that's a pretty good model that we should just use. One remaining problem is processing the "sameAs" statements - but LR is a metadata delivery tool, not a processing tool - I think a few people are working with Sparql (ASN) or Neo4j (InBloom/LR search) and those graph databases would be the best place to process sameAs information coming from LR. But LR allows us to tell each other what we know about sameAs URLs that might crop up - so we don't all have to figure it out on our own!

�

That said, I think there's a thornier problem which is that in addition to "sameAs" URLs around the web, there are also "prettyMuchTheSameAs" URLs all over the web. They resolve to different websites, with different tool bars, logins, etc, but from a user's perspective each resource is "pretty much the same" as the other resources. If you embed a Khan Academy video on NSDL and CA Dept of Ed's teacher portal, the user will see the same video and would probably not really think of the two pages as significantly different. Right now the web doesn't have a mechanism to talk about pages that are pretty much the same, but not exactly the same. I think it would be great if we could start sharing prettyMuchTheSameAs information with each other through Learning Registry (and elsewhere) b/c it could greatly reduce the number of times a user gets presented with two resources on a search results page, but that for all the user cares, the two results are equivalent.. That's a bit down the road, but hopefully it makes sense what I'm getting at?

�

What do you think? Am I talking about the issues you're raising?�

�

Best,

Steve

�

�

�

�

�

�

�

�

On Thu, Jun 27, 2013 at 7:36 PM, Joshua Marks <jma...@curriki.org> wrote:

Steve,

�

So it seems there is technical deficiency in that the LR content record must contain an additional listing of use rights for the metadata in addition to the use rights for the content it describes. This would support your described approach of enabling release of metadata with additional or variable restrictions. Nodes or indexes used by commercial vendors would need to filter out all non-commercial metadata records (even if they point to more liberally licensed assets). �

�

Now this brings-up a related topic about multiple GUIDS for the same asset that I have been wondering about. Let�s say Khan publishes all his videos� metadata via LR. Let�s also say that in Curriki, OER commons, Engage NY, The Illinois repository each have users that create content records for the same asset as a link to the same YouTube URL/ID are variably described and also published to the LR. So now there are N different content records with different metadata and GUIDs about the same asset� Hmmmmm. Not as useful as we would like.

�

The issue here is a fundamental misconception of what Open means. The publisher of content records should be the official copyright holders hosting location for the actual source for the asset. It is problematic to have systems that are not content repositories, but rather are acting as referatories (links + metadata), generate GUIDS for content. �But if the actually source location is not a LR publishing node, what is to be done?

�

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

US 831-685-3511

�

I welcome you to�become a member�of the Curriki community, to follow us on�Twitter�and to say hello on our�blog,�Facebook�and�LinkedIn�communities.

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Thursday, June 27, 2013 3:24 PM
To: <learning-regis...@googlegroups.com>
Cc: learningregistry


Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

I don't think NSDL has announced anything about how they license their metadata. I believe they raised this issue in the public forum to indicate they are trying to figure out how to share their metadata. Should they use a non-commercial metadata license? Should they restrict access to their metadata to only certain partners? They're trying to decide what's next I think..

�

In my opinion, orgs can get a lot of heat for releasing metadata into the open under restrictive licenses, but orgs that don't release their metadata *at all* in the open don't get talked about as much. I'd encourage us all to have an apples-to-apples comparison when evaluating the decisions that all orgs have to make. There are totally legitimate business reasons to restrict access to metadata, but releasing metadata under restrictive terms should get more credit in terms of "openness" than not releasing it at all.

�

And of course, Diny and ASN is highly permissive with their metadata (yay!) so credit to Jes and co for making their IP available to everyone to improve the open infrastructure in education.

�

Steve

�

�

�

�

On Thu, Jun 27, 2013 at 4:18 PM, Diny Golder <di...@jesandco.org> wrote:

Greetings, I don't recall discussions about "restricting the use" of NSDL metadata? Now there may be resources that are copyrighted here and there and the creators has the right to do what they want under NSF funding (commercial development is encouraged in NSF projects) but the NSDL �program was designed to share metadata and there �are commercial education products out there that include NSDL metadata in the offering, which is not restricted. If the creator decides to charge for the resource, I believe they can. NSDL projects were encouraged to become self sustainable, hence charging for whatever came out of the grant. I know that we can charge for ASN data, some of which was created with NSF funds. Charging or not charging is our decision. But, whether we charge or don't charge, commercial or non-commercial organizations can use ASN data. The licensing is up to us and we expect users of ASN data to adhere to our licensing requirements.

Many thanks, Diny

�

Diny Golder ~ Sent from my iPad


On Jun 27, 2013, at 7:28 AM, "Kelly Peet" <ke...@academicbenchmarks.com> wrote:

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
---
This message is posted from the Google Groups "LearningRegistry" group. More information about the Learning Registry project can be found at http://learningregistry.org/
�
To post: learning...@googlegroups.com
To unsubscribe: learningregist...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/learningregistry?hl=en?hl=en
�
---
You received this message because you are subscribed to the Google Groups "Learning Registry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningregist...@googlegroups.com.
�
�
Thursday, June 27, 2013 8:13 PM
Hey Joshua,

I think you are raising two (and a half) distinct issues..

1) Metadata has a license and content has a license - they are not always the same.
1A) Can non-commercial metadata be distributed over commercial LR nodes?
2) Resources on the internet/web can have multiple URL/URI identifiers.

Let me try my best on these, one at a time..�

#1 - Metadata rights vs content rights. I don't think LR has a technical deficiency regarding metadata licensing. The LR envelope has a field called "TOS" - publishers provide a URL to the Terms Of Service that describes how others may use their metadata that has been published into the LR network in this field. This is required - but the TOS can be anything at all. If it is invalid or incomprehensible, then we recommend orgs do not consume/use those envelopes. This hasn't been a problem to date.

Optionally, if a publisher wishes to *also* indicate a license for their content in LR, this is easily done via whatever metadata schema is provided in the LR payload that describes the content. For LRMI, provide the field "useRightsURL" (cf.�http://www.lrmi.net/the-specification) pointing to the rights to use the content itself (e.g.�http://creativecommons.org/licenses/by/3.0/). For Dublin core use the "rights" field (http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=elements#rights).�

#1A - Non-commercial metadata distribution. I think you also are asking if commercial vendors who operate LR nodes must avoid non-commercial metadata on their nodes? I believe the answer to this is "no." Nodes are "dumb pipes" and so under DMCA (in my non-lawyerly but informed opinion) are classified as infrastructure. If a commercial operator were to *use* the metadata, then they would need to be sure they are not using non-commercial metadata (obviously). In fact, as I understand DMCA, if a commercial operator were to set up filtering by metadata then that node *might* not be classed as a "dumb pipe," which could *increase* the potential liability. Providing a full-feed of metadata on a node means the operator is not party to the metadata, just a "by-standing" infrastructure operator transporting it from A to B. LR was specifically designed to take advantage of this approach. I can't provide legal guarantees (no one can) but I talked about this in detail with several law professors who are experts in DMCA, copyright and the internet, so I think this issue is OK.

Did I address the deficiency you were describing - or can you clarify?

#2 - Multiple URLs that point to the same resource. This is a general problem, and not intrinsic to LR. LR does provide an opportunity to share what is known about duplicate URL IDs. Think of it as a federated HTTP 300 or something like that. W3C has a proposal for Schema.org to allow these statements:�http://www.w3.org/wiki/WebSchemas/sameAs. My opinion is that's a pretty good model that we should just use. One remaining problem is processing the "sameAs" statements - but LR is a metadata delivery tool, not a processing tool - I think a few people are working with Sparql (ASN) or Neo4j (InBloom/LR search) and those graph databases would be the best place to process sameAs information coming from LR. But LR allows us to tell each other what we know about sameAs URLs that might crop up - so we don't all have to figure it out on our own!

That said, I think there's a thornier problem which is that in addition to "sameAs" URLs around the web, there are also "prettyMuchTheSameAs" URLs all over the web. They resolve to different websites, with different tool bars, logins, etc, but from a user's perspective each resource is "pretty much the same" as the other resources. If you embed a Khan Academy video on NSDL and CA Dept of Ed's teacher portal, the user will see the same video and would probably not really think of the two pages as significantly different. Right now the web doesn't have a mechanism to talk about pages that are pretty much the same, but not exactly the same. I think it would be great if we could start sharing prettyMuchTheSameAs information with each other through Learning Registry (and elsewhere) b/c it could greatly reduce the number of times a user gets presented with two resources on a search results page, but that for all the user cares, the two results are equivalent.. That's a bit down the road, but hopefully it makes sense what I'm getting at?

What do you think? Am I talking about the issues you're raising?�

Best,
Steve










--
--
---
This message is posted from the Google Groups "LearningRegistry" group. More information about the Learning Registry project can be found at http://learningregistry.org/
�
To post: learning...@googlegroups.com
To unsubscribe: learningregist...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/learningregistry?hl=en?hl=en
�
---
You received this message because you are subscribed to the Google Groups "Learning Registry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningregist...@googlegroups.com.
�
�
Thursday, June 27, 2013 4:36 PM

Steve,

�

So it seems there is technical deficiency in that the LR content record must contain an additional listing of use rights for the metadata in addition to the use rights for the content it describes. This would support your described approach of enabling release of metadata with additional or variable restrictions. Nodes or indexes used by commercial vendors would need to filter out all non-commercial metadata records (even if they point to more liberally licensed assets). �

�

Now this brings-up a related topic about multiple GUIDS for the same asset that I have been wondering about. Let�s say Khan publishes all his videos� metadata via LR. Let�s also say that in Curriki, OER commons, Engage NY, The Illinois repository each have users that create content records for the same asset as a link to the same YouTube URL/ID are variably described and also published to the LR. So now there are N different content records with different metadata and GUIDs about the same asset� Hmmmmm. Not as useful as we would like.

�

The issue here is a fundamental misconception of what Open means. The publisher of content records should be the official copyright holders hosting location for the actual source for the asset. It is problematic to have systems that are not content repositories, but rather are acting as referatories (links + metadata), generate GUIDS for content. �But if the actually source location is not a LR publishing node, what is to be done?

�

Joshua Marks

CTO

Curriki: The Global Education and Learning Community

jma...@curriki.org

www.curriki.org

US 831-685-3511

�

I welcome you to�become a member�of the Curriki community, to follow us on�Twitter�and to say hello on our�blog,�Facebook�and�LinkedIn�communities.

�

From: learning-regis...@googlegroups.com [mailto:learning-regis...@googlegroups.com] On Behalf Of Steve Midgley
Sent: Thursday, June 27, 2013 3:24 PM
To: <learning-regis...@googlegroups.com>
Cc: learningregistry
Subject: Re: [Learning Registry: Collaborate] Metadata Terms of Service

�

I don't think NSDL has announced anything about how they license their metadata. I believe they raised this issue in the public forum to indicate they are trying to figure out how to share their metadata. Should they use a non-commercial metadata license? Should they restrict access to their metadata to only certain partners? They're trying to decide what's next I think..

�

In my opinion, orgs can get a lot of heat for releasing metadata into the open under restrictive licenses, but orgs that don't release their metadata *at all* in the open don't get talked about as much. I'd encourage us all to have an apples-to-apples comparison when evaluating the decisions that all orgs have to make. There are totally legitimate business reasons to restrict access to metadata, but releasing metadata under restrictive terms should get more credit in terms of "openness" than not releasing it at all.

�

And of course, Diny and ASN is highly permissive with their metadata (yay!) so credit to Jes and co for making their IP available to everyone to improve the open infrastructure in education.

�

Steve

�

�

�

�

On Thu, Jun 27, 2013 at 4:18 PM, Diny Golder <di...@jesandco.org> wrote:

Greetings, I don't recall discussions about "restricting the use" of NSDL metadata? Now there may be resources that are copyrighted here and there and the creators has the right to do what they want under NSF funding (commercial development is encouraged in NSF projects) but the NSDL �program was designed to share metadata and there �are commercial education products out there that include NSDL metadata in the offering, which is not restricted. If the creator decides to charge for the resource, I believe they can. NSDL projects were encouraged to become self sustainable, hence charging for whatever came out of the grant. I know that we can charge for ASN data, some of which was created with NSF funds. Charging or not charging is our decision. But, whether we charge or don't charge, commercial or non-commercial organizations can use ASN data. The licensing is up to us and we expect users of ASN data to adhere to our licensing requirements.

Many thanks, Diny

�

Diny Golder ~ Sent from my iPad


On Jun 27, 2013, at 7:28 AM, "Kelly Peet" <ke...@academicbenchmarks.com> wrote:

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

�

--
--
You received this message because you are subscribed to the Google
Groups "Learning Registry: Collaborate" group.

�

�

�


---
You received this message because you are subscribed to the Google Groups "Learning Registry: Collaborate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learning-registry-co...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

�
�

--
--
---
This message is posted from the Google Groups "LearningRegistry" group. More information about the Learning Registry project can be found at http://learningregistry.org/
�
To post: learning...@googlegroups.com
To unsubscribe: learningregist...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/learningregistry?hl=en?hl=en
�
---
You received this message because you are subscribed to the Google Groups "Learning Registry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningregist...@googlegroups.com.
�
�
Thursday, June 27, 2013 3:23 PM
I don't think NSDL has announced anything about how they license their metadata. I believe they raised this issue in the public forum to indicate they are trying to figure out how to share their metadata. Should they use a non-commercial metadata license? Should they restrict access to their metadata to only certain partners? They're trying to decide what's next I think..

In my opinion, orgs can get a lot of heat for releasing metadata into the open under restrictive licenses, but orgs that don't release their metadata *at all* in the open don't get talked about as much. I'd encourage us all to have an apples-to-apples comparison when evaluating the decisions that all orgs have to make. There are totally legitimate business reasons to restrict access to metadata, but releasing metadata under restrictive terms should get more credit in terms of "openness" than not releasing it at all.

And of course, Diny and ASN is highly permissive with their metadata (yay!) so credit to Jes and co for making their IP available to everyone to improve the open infrastructure in education.

Steve






--
--
---
This message is posted from the Google Groups "LearningRegistry" group. More information about the Learning Registry project can be found at http://learningregistry.org/
�
To post: learning...@googlegroups.com
To unsubscribe: learningregist...@googlegroups.com

For more options, visit this group at