Using a well known location for information can be useful as a
bootstrapping mechanism, but is a poor design principle for new
services. It implies that every time you want to describe resources
for a new information domain, you need a new well known location,
potentially duplicating the information in others. Sitemaps overcome
this to a certain extent by "hooking" robots.txt, but even that seems
like a partial solution.
oer.txt as currently proposed suffers from a lack a flexibility: the
current proposal doesn't say anything about why the resources linked
are relevant, and for what purpose (ie, for a particular subject? a
particular education level? Ideally this is published with the
resource, but you can imagine situations where other curators have
differing views). It also doesn't provide any self-description of
who's asserting that they're OER. You could conceivably look at the
domain registration for information about this, but that's further
muddled by the fact that you're [potentially] pointing at other
domains: which registration do you retrieve, and what's the
relationship of one to the other?
Finally, the use of a well known location at the domain level
effectively restricts who can publish these maps/assertions to people
who control domains. In a world of linked data, that seems like an
unnecessary limitation.
It seems like a better model for enabling discovery is using a machine
readable document linked from the resource or root page of a
collection. This could provide simple pointers to services, similar to
oer.txt, but could also provide richer information about the resources
on a site. Ideally this would be in a general format (I'm biased to
RDF, but you can imagine using POWDER or something else) that would
allow you to provide additional information about resources in
additional to labeling it as "educational".
NRY
----
Nathan R. Yergler
Chief Technology Officer
Creative Commons
Thank you very much for this. Some of my thoughts are below, including
perhaps the seed of a better proposal.
Before this thread expands too much, I want to be clear on something:
I'm not married to the oer.txt solution and I would consider this
proposal a success if the community talked and reached any solution,
regardless of what that solution is. I get the sense we all agree we
need something like this but that oer.txt as proposed has weaknesses.
Fine, let's do better and kill it :)
By the way, when I say Sitemaps (capitalised) I'm referring to the
Sitemaps protocol and when I say sitemaps (lowercase) it's the sitemap
file.
> Using a well known location for information can be useful as a
> bootstrapping mechanism, but is a poor design principle for new
> services. It implies that every time you want to describe resources
> for a new information domain, you need a new well known location,
> potentially duplicating the information in others. Sitemaps overcome
> this to a certain extent by "hooking" robots.txt, but even that seems
> like a partial solution.
When I was thinking about this, I came up with exactly these two solutions:
1. The "general" one where we can hook into robots.txt exactly like
Sitemaps did. I discounted this idea for two reasons:
a. The Sitemaps protocol had the weight of the top 3 search engines
behind it from the get go. It created instant real demand overnight.
Our small industry doesn't have such heavyweights to create similar
demand.
b. Sitemaps specified an XML format that had to be strictly adhered to
or else it failed. The XML is not really "extensible" as far as I
know: you either strictly follow the Sitemaps schema or it's an
invalid sitemap. I'm pretty sure you can't extend it by introducing
new namespaces. For our purposes, my proposal would have been that we
need yet another Sitemaps-like XML format that we all agree to and it
serves the needs of all current OER publishers. Who's up for leading
that proposal? I'm not! And what are its chances of success given
point (1)? Minimal at best.
2. The "specialised" one: a domain-specific robots.txt-like solution
which is oer.txt. Given my reservations about the general approach,
this won. oer.txt is not about specifying a new format but is a way to
advertise what already exists. It's a business card to learn how to
get in touch.
However, I would love it if we can figure out a good general solution.
> oer.txt as currently proposed suffers from a lack a flexibility: the
> current proposal doesn't say anything about why the resources linked
> are relevant, and for what purpose (ie, for a particular subject? a
> particular education level? Ideally this is published with the
> resource, but you can imagine situations where other curators have
> differing views). It also doesn't provide any self-description of
> who's asserting that they're OER. You could conceivably look at the
> domain registration for information about this, but that's further
> muddled by the fact that you're [potentially] pointing at other
> domains: which registration do you retrieve, and what's the
> relationship of one to the other?
This type of flexibility is intentionally outside the scope of
oer.txt. The way I see oer.txt is that it's a guide. Imagine reaching
a road junction and you see sign that says "Cambridge" pointing left,
one sign saying "London" pointing right, and one saying "Airport" up
ahead. The signs don't care what Cambridge or London or Airport mean,
but it matters where they are. oer.txt is exactly like that: an OER
road sign that says "there is RSS at this URL and there is an OAI
endpoint at that URL". If you want Cambridge, you turn left and if you
want RSS you go to that URL.
Describing the content is up to the URLs oer.txt points to. For
example, MIT has two RSS main feeds (one for video & audio and for
text), Stanford has an RSS feed for each course, and you can get
YouTube Edu videos using RSS and the API.
As to who's asserting it's OER, it's the content producer as trusted
by the consumer, which is a bit of a weak point in this proposal.
Content consumers need to trust the website when it says it has OER.
Intuitively, I would trust someuniversity.edu saying it has OER more
than I would trust (say) cnn.com. So there is some responsibility put
on the consumers.
I'd also point out that "lying" about having OER is a form of spam and
if it gets too big then it will be a problem. For a large content
consumer like Google, they have a dedicated web spam team (headed by
Matt Cutts who you might have heard of).
> Finally, the use of a well known location at the domain level
> effectively restricts who can publish these maps/assertions to people
> who control domains. In a world of linked data, that seems like an
> unnecessary limitation.
The restrictions to those who run domains is a very valid concern. I
can only point out that I've yet to see an OER website that does not
have a robots.txt file. MIT for example: http://ocw.mit.edu/robots.txt
and Connexions: http://cnx.org/robots.txt. All the ones that I can't
think of one hosted as a subdirectory.
However, you triggered an idea for a solution: how favicons are
handled by browsers. By convention, browsers expect to find a
website's favicon as a file called favicon.ico at the domain's root,
but webmasters can specify an alternate location in the HTML. Perhaps
this is a good analogy to learn from? Suppose that we say oer.txt
should live at the website's root but if you want, in the HTML you
state the actual location if you want. This opens up the proposal to
every single OER producer. See http://en.wikipedia.org/wiki/Favicon
for reference.
> It seems like a better model for enabling discovery is using a machine
> readable document linked from the resource or root page of a
> collection. This could provide simple pointers to services, similar to
> oer.txt, but could also provide richer information about the resources
> on a site. Ideally this would be in a general format (I'm biased to
> RDF, but you can imagine using POWDER or something else) that would
> allow you to provide additional information about resources in
> additional to labeling it as "educational".
How would you know the URLs of the roots of collections? We could make
this work if we agree how to automatically identify a collection's
root - perhaps a special sitemap? But we'd still need a way to specify
the URLs of tools like OAI harvesting and search.
This line of thinking forces us to recognize that there are two types
of URLs oer.txt can advertise:
1. A collection-specific URL, like an RSS feed of the course lectures
2. A cross-collection URL, like a search engine or a meta data harvesting API
Should we handle these differently? Perhaps oer.txt is better suited
for (2) but autodiscovery from the root home pages is best for (1).
Thoughts?
Thanks again for the comments!
Pierre
--
Pierre Far, PhD
About me: http://www.pierrefar.com/
NEW! OpenCourseWare Search: http://www.ocwsearch.com/
Webmaster and SEO resources: http://ekstreme.com
Blog of Science: http://blogsci.com
Thanks for the detailed reply; comments inline below.
I'm not entirely sure that I see the distinction between your general
and specialized approaches: even though oer.txt does not define a new
format for the resources themselves, it does have a specific format
for describing where to look. It also seems like the absence of a
heavy weight equally applies in any of these scenarios: publishers are
going to wonder what's in it for them, regardless of whether we're
asking them to edit robots.txt, oer.txt, or something else. I don't
have a good answer to the "heavyweight" question, although I do think
that many in the community are willing to experiment and publish
pointers to existing services, if asked.
I think the answer to general vs. specific is describing things using
a general format with domain specific semantics. To draw an analogy to
some of our past work, we pushed development of RDFa at the W3 as a
general solution so we could use it to address domain specific
problems (labeling licenses). This has allowed us to continue to build
on that technology in a very flexible manner.
See below for more concrete suggestions in this vein.
>
>> oer.txt as currently proposed suffers from a lack a flexibility: the
>> current proposal doesn't say anything about why the resources linked
>> are relevant, and for what purpose (ie, for a particular subject? a
>> particular education level? Ideally this is published with the
>> resource, but you can imagine situations where other curators have
>> differing views). It also doesn't provide any self-description of
>> who's asserting that they're OER. You could conceivably look at the
>> domain registration for information about this, but that's further
>> muddled by the fact that you're [potentially] pointing at other
>> domains: which registration do you retrieve, and what's the
>> relationship of one to the other?
>
> This type of flexibility is intentionally outside the scope of
> oer.txt. The way I see oer.txt is that it's a guide. Imagine reaching
> a road junction and you see sign that says "Cambridge" pointing left,
> one sign saying "London" pointing right, and one saying "Airport" up
> ahead. The signs don't care what Cambridge or London or Airport mean,
> but it matters where they are. oer.txt is exactly like that: an OER
> road sign that says "there is RSS at this URL and there is an OAI
> endpoint at that URL". If you want Cambridge, you turn left and if you
> want RSS you go to that URL.
That's useful; I think some explicit setting of scope would be helpful
for any proposal in this space. I also think that a solution that
allows you to [optionally] add information regarding why it's OER,
who's making the statement, etc should be preferred.
>
> Describing the content is up to the URLs oer.txt points to. For
> example, MIT has two RSS main feeds (one for video & audio and for
> text), Stanford has an RSS feed for each course, and you can get
> YouTube Edu videos using RSS and the API.
>
> As to who's asserting it's OER, it's the content producer as trusted
> by the consumer, which is a bit of a weak point in this proposal.
> Content consumers need to trust the website when it says it has OER.
> Intuitively, I would trust someuniversity.edu saying it has OER more
> than I would trust (say) cnn.com. So there is some responsibility put
> on the consumers.
Completely concur that "normal" trust mechanisms apply. I think that's
true regardless of how much information a publisher/curator provides
as background.
>
> I'd also point out that "lying" about having OER is a form of spam and
> if it gets too big then it will be a problem. For a large content
> consumer like Google, they have a dedicated web spam team (headed by
> Matt Cutts who you might have heard of).
>
>> Finally, the use of a well known location at the domain level
>> effectively restricts who can publish these maps/assertions to people
>> who control domains. In a world of linked data, that seems like an
>> unnecessary limitation.
>
> The restrictions to those who run domains is a very valid concern. I
> can only point out that I've yet to see an OER website that does not
> have a robots.txt file. MIT for example: http://ocw.mit.edu/robots.txt
> and Connexions: http://cnx.org/robots.txt. All the ones that I can't
> think of one hosted as a subdirectory.
Right. Not saying that people won't publish to robots.txt, or oer.txt,
just that it'd be nice to enable individuals, etc (at the
"sub-directory level", so to speak) to also publish pointers (I'll
admit this may again be an issue of me not understanding scope
entirely).
>
> However, you triggered an idea for a solution: how favicons are
> handled by browsers. By convention, browsers expect to find a
> website's favicon as a file called favicon.ico at the domain's root,
> but webmasters can specify an alternate location in the HTML. Perhaps
> this is a good analogy to learn from? Suppose that we say oer.txt
> should live at the website's root but if you want, in the HTML you
> state the actual location if you want. This opens up the proposal to
> every single OER producer. See http://en.wikipedia.org/wiki/Favicon
> for reference.
Yes; see below.
>
>> It seems like a better model for enabling discovery is using a machine
>> readable document linked from the resource or root page of a
>> collection. This could provide simple pointers to services, similar to
>> oer.txt, but could also provide richer information about the resources
>> on a site. Ideally this would be in a general format (I'm biased to
>> RDF, but you can imagine using POWDER or something else) that would
>> allow you to provide additional information about resources in
>> additional to labeling it as "educational".
>
> How would you know the URLs of the roots of collections? We could make
> this work if we agree how to automatically identify a collection's
> root - perhaps a special sitemap? But we'd still need a way to specify
> the URLs of tools like OAI harvesting and search.
I guess "root of collection" isn't quite clear; I meant that instead
of retrieving oer.txt from ocw.mit.edu (for example), you retrieve
http://ocw.mit.edu/ and look for information there (RDFa, POWDER, etc)
that points to specific services (ie, an RSS feed, OAI-PMH endpoint,
etc), or a separate resource that describes a set (preferably in a
similar language). The SIOC (http://sioc-project.org/) services module
defines one way this could be described using has_service and
service_protocol; we use this on CC Network with RDFa in the document
header:
<link about="/" rel="sioc_service:has_service" href="/r/lookup" />
<link about="/r/lookup" rel="sioc_service:service_protocol"
href="http://wiki.creativecommons.org/work-lookup" />
This is similar to the favicon approach, but instead of saying
"default to retrieving /oer.txt", you say "retrieve / and we'll tell
you what to do from there."
>
> This line of thinking forces us to recognize that there are two types
> of URLs oer.txt can advertise:
> 1. A collection-specific URL, like an RSS feed of the course lectures
> 2. A cross-collection URL, like a search engine or a meta data harvesting API
>
> Should we handle these differently? Perhaps oer.txt is better suited
> for (2) but autodiscovery from the root home pages is best for (1).
> Thoughts?
I'm not convinced the two cases are all that different, and I think
that looking for a solution that accommodates both will help guide the
general v. specialized path.
Thanks for taking the time to put together your draft and helping
everyone think about these issues. I hope this above are helpful and
clarify my thoughts.
Best,
Nathan
I have a big question about this relating to discovery, which Nathan
knows I'm a bit obsessed with right now. With robots.txt, there are
basically three orgs (Y, G and MS) who access them at scale. Who would
access the oer.txt files besides these three? And of those others, who
wouldn't already know what they need to know about the site (meaning
they already harvest stuff from your site)?
Thanks. Seems like there are a few objectives floating around?
1) Service/end-point discovery. Provide a way for sites to tell folks where their feeds are located and in what formats they can be retrieved. E.g., “You can find an RSS of OER at [/xyz]. You can find an OAI-PMH search interface to OER at [/abc].”
2) OER iteration. This is more like a traditional site map, that lets all comers iterate over a structured list of all resources on the site. E.g., “Here’s a file where you can find info on all the resources on our site in [abc] format.”
3) Declare that there exist OER on the site and what license they are in. Provide some namespace hints as to where they are. E.g., “This site has OER licensed CC-by-30. You’ll find them under [/xyz].”
#2 could be subsumed into #1 but I’d guess that oer.txt is proposed for #2 b/c #1 is a little cumbersome/complicated for very simple instances? #3 is mostly analogous to robots.txt but for OER.
Not sure if I’ve captured the elements here, but hopefully helpful.
Best,
Steve
Is there reason to believe big search engines will use this
information if they can discover it? (Actually curious -- are there
conversations people have had that indicate this is a blocker?).
NRY
NRY
Holding that aside for this discussion, I do like their design goals for sure. Related to this, we have been looking a little at POWDER which seems to provide similar capabilities? OPML seems nicer to me b/c it's easier to understand (and presumably implement). Do you have any opinions on the difference between the two?
Moving a little further, it seems like OPML could meet all the criteria for oer.txt but only if you accept that you want to present all your OER for your in a hierarchical list? I could imagine wanting to share info differently (RDF-ish maybe), though the OMPL method does seem like it would be simpler and easier (both to write and read).
With that line of thinking in mind, what are the limitations of RSS itself for this purpose? Why not just publish an RSS feed of OER for a site? What does OPML get you that is valuable over RSS?
And more broadly what does this group want from oer.txt that either RSS or OPML can't do (if anything)?
Best,
Steve
________________________________________
From: oertxt-wor...@googlegroups.com [oertxt-wor...@googlegroups.com] On Behalf Of Scott Wilson [scott.brad...@gmail.com]
Sent: Friday, December 10, 2010 9:48 AM
To: oer.txt Working Group
Subject: Re: Initial Thoughts on oer.txt
There is already a format that meets all the UCs, and that is: OPML.
Its stale because it has been very widely adopted and hasn't needed to be changed. Its more a set of conventions than a specification - in many ways its a "bad" specification, but is very successful despite that!
Here are some of the applications that currently support OPML:
Google Reader
iGoogle
Bloglines
Netvibes
Yahoo
Windows Live
Blogger
Wordpress
Outlook
Drupal
Internet Explorer (!)
Sony PSP (!)
Firefox
iTunes
Anywhere on the web you see a term like "subscriptions" "subscription list" or "blogroll" its referring to OPML. In any application that says something like "import list of feeds" it usually means "in OPML".
On 10 Dec 2010, at 16:27, Midgley, Steve wrote:
>
> Holding that aside for this discussion, I do like their design goals for sure. Related to this, we have been looking a little at POWDER which seems to provide similar capabilities? OPML seems nicer to me b/c it's easier to understand (and presumably implement). Do you have any opinions on the difference between the two?
>
> Moving a little further, it seems like OPML could meet all the criteria for oer.txt but only if you accept that you want to present all your OER for your in a hierarchical list? I could imagine wanting to share info differently (RDF-ish maybe), though the OMPL method does seem like it would be simpler and easier (both to write and read).
>
> With that line of thinking in mind, what are the limitations of RSS itself for this purpose? Why not just publish an RSS feed of OER for a site? What does OPML get you that is valuable over RSS?
Not a lot. Its not really about the format, more the levels of adoption. For example, if OERs are exported as an OPML file (as a collection feeds), you can immediately import them into one the above without modification.
RSS has very similar capabilities, it just isn't used for the same purpose in existing software, so clicking the "import subscriptions" button won't usually give you an RSS option.
I think the use cases are:
- if OERs are exposed as a flat list of individual resources, use RSS
- if OERs are exposed as a list of RSS feeds each of which is a list of resources (e.g. albums or courses), use OPML
Thanks!
Steve
________________________________________
From: oertxt-wor...@googlegroups.com [oertxt-wor...@googlegroups.com] On Behalf Of Scott Wilson [scott.brad...@gmail.com]
Sent: Friday, December 10, 2010 11:48 AM
To: oertxt-wor...@googlegroups.com
Just to back up scott and nathan here, use of sword for scholarly research objects (which uses atom) is what we would want to see built upon (used worldwide already). Far too many tools/libs for rss/atom/opml family to go and have YATP (yet another transport protocol) re ore.txt to build stuff for; though agree exploration of rdf shoehorning would be worth building upon (previous work includes ore to atom) and sword 2 is experimenting w predicates in atom. Speak w richard jones and stuart lewis. /dff