I have just written up a little overview of how folskonomies, ontologies, tagging and atom fit together [1]. One point I make is that often one can concatenate the scheme + "/" + term to get the url of the category at which one can retrieve all the entries that belong to the category. This works with Tim Brays feed, and it is the behavior of del.icio.us too. I suppose that because the scheme is not necessarily a URL this won't work in every case.
Would it not have been nice if we could have had a system whereby scheme+term gives us a URI with which we can then identify the category itself? What do people do now? Is it useful to assume that this is the case when writing a client?
Henry Story wrote: > I have just written up a little overview of how folskonomies, > ontologies, tagging and atom fit together [1]. One point I make is > that often one can concatenate the scheme + "/" + term to get the url > of the category at which one can retrieve all the entries that belong > to the category. This works with Tim Brays feed, and it is the > behavior of del.icio.us too. I suppose that because the scheme is not > necessarily a URL this won't work in every case.
> Would it not have been nice if we could have had a system whereby > scheme+term gives us a URI with which we can then identify the > category itself? What do people do now? Is it useful to assume that > this is the case when writing a client?
Henry Story wrote: > [snip] > Would it not have been nice if we could have had a system whereby > scheme+term gives us a URI with which we can then identify the category > itself? What do people do now? Is it useful to assume that this is the > case when writing a client?
> It's useful in many cases but definitely not in all. Feed consumers > should not assume that this pattern is being used.
There seems to definitively be an expectation by a lot of feed producers (including Tim Bray) that this is how it should work, otherwise why bother having categories that line up so nicely with urls that point to documents containing representations of all the entries from that category?
Would it be good to have some kind of best practices manual that would tie in these loose ends?
What would be the best way to tie this one up for a client? Should he do the following on first seeing a new category:
if ( category.scheme() instanceof URL) { Request call = new Request(Method.HEAD, category.scheme()); Client client = new Client(Protocol.HTTP); Response response = client.handle(call);
if (response.getStatus().getCode() == 200) try { //what is the best way to search for this id? URL catid = new URL(category.scheme(),category.term()); category.setId(catid); } catch (MalformedURLException e) { }
}
Perhaps he should then do a HEAD on catid too and see if there is something there. If so, he could make the category display as a hyperlink in the UI?
Does this seem like a good idea? Anyone else tried this?
> Henry Story wrote: >> [snip] >> Would it not have been nice if we could have had a system whereby >> scheme+term gives us a URI with which we can then identify the >> category >> itself? What do people do now? Is it useful to assume that this is >> the >> case when writing a client?
Assuming that scheme and term can or even may be concatenated into a URI can have some detrimental effects when doing so was not the intention of the publisher. Case in point, for IBM's activities work, our schemes initially used http: URI's in the ibm.com domain. The idea was that the scheme would eventually point to a resource describing the scheme and the product. However, we quickly discovered that some clients were combining the scheme and term and attempting to dereference the URI, causing the folks who run ibm.com a lot of grief because of a whole bunch of 404 errors that were suddenly showing up in their logs. Now, this is easily preventable, of course, and was quickly addressed, but the point remains: unless the publisher of the feed intends the scheme and term to be combined to produce something useful, consumers should not assume they can do so.
That said, since that early initial experience, I've been recommending that folks not use http uri's for category schemes if they do not intend for folks to dereference them.
> On 1 Nov 2006, at 16:53, James M Snell wrote: >> It's useful in many cases but definitely not in all. Feed consumers >> should not assume that this pattern is being used.
> There seems to definitively be an expectation by a lot of feed producers > (including Tim Bray) that this is how it should work, otherwise why > bother having categories that line up so nicely with urls that point to > documents containing representations of all the entries from that category?
> Would it be good to have some kind of best practices manual that would > tie in these loose ends?
> What would be the best way to tie this one up for a client? Should he do > the following on first seeing a new category:
> if ( category.scheme() instanceof URL) { > Request call = new Request(Method.HEAD, category.scheme()); > Client client = new Client(Protocol.HTTP); > Response response = client.handle(call);
> if (response.getStatus().getCode() == 200) try { > //what is the best way to search for this id? > URL catid = new URL(category.scheme(),category.term()); > category.setId(catid); > } catch (MalformedURLException e) { } > }
> Perhaps he should then do a HEAD on catid too and see if there is > something there. If so, he could make the category display as a > hyperlink in the UI?
> Does this seem like a good idea? Anyone else tried this?
>> - James
>> Henry Story wrote: >>> [snip] >>> Would it not have been nice if we could have had a system whereby >>> scheme+term gives us a URI with which we can then identify the category >>> itself? What do people do now? Is it useful to assume that this is the >>> case when writing a client?
> From: owner-atom-proto...@mail.imc.org > [mailto:owner-atom-proto...@mail.imc.org] On Behalf Of Henry Story > Sent: 01 November, 2006 11:23 > To: atom-owl@googlegroups.com > Cc: atom-protocol Protocol > Subject: Re: categories and tagging
> On 1 Nov 2006, at 16:53, James M Snell wrote: >> It's useful in many cases but definitely not in all. Feed > consumers >> should not assume that this pattern is being used.
> There seems to definitively be an expectation by a lot of > feed producers (including Tim Bray) that this is how it > should work, otherwise why bother having categories that line > up so nicely with urls that point to documents containing > representations of all the entries from that category?
> Would it be good to have some kind of best practices manual > that would tie in these loose ends?
The problem with this strategy is that you cause problems for controlled vocabularies that don't use this approach. There are many in the library and other communities that have issues with both RSS 2.0 and Atom's specification of categories.
One area that is problematic is how to specify concepts in controlled vocabularies that are encoded using the emerging W3C SKOS specification. SKOS is an RDF application. As such it is URI focused. The SKOS community would like to specify these concepts in RSS 2.0 and Atom, but there are issues.
In RSS 2.0 the specification allows for a domain attribute, similar in Atom, and the categories term, but the category term is suppose to be a slash delimited value. That causes problems for a number of controlled vocabularies where slash is a valid character in the concepts label and RSS 2.0 provides no way to escape a slash.
Atom has followed a similar strategy minus the slash delimited content nonsense, I think. However, it still presents problems for using controlled vocabularies encoded in SKOS. The issue is the separation of the URI and category/concepts label. In a folksonomy you might do the following:
domain = URI category = cats
However, in SKOS you have a URI to the concept "cats", period. You could map the domain to be the SKOS concept scheme's URI and use the concepts label as the category content in Atom. But concatenating the SKOS concept scheme URI and the concept label doesn't necessarily produce the URI to the concept. For example, in SKOS you might have the following:
The reason for doing this has to do with the fact that labels *can* be specified in multiple languages where as the URI for the concept is a constant that doesn't changed based upon the language. Thus if you were to do what you are proposing you just prevented the SKOS community from using SKOS concepts in Atom.
Unfortunately Atom doesn't permit just specifying a URI to a concept. So using controlled vocabularies encoded in SKOS is still an issue with Atom. Using the cats example, one would have to do the following:
That seems like a reasonable mapping between SKOS's specification of the concept cats and Atom specification of a category. But it isn't, because I just choose the English label for cats someone else could have chosen the Spanish or French label for cats and an aggregator will probably think the two Atom categories are different.
You could say, well the aggregator could determine from the SKOS encoding that the English, Spanish and French labels were the same and map them to the same bucket. However, there is no way in Atom to specify that the URI is associated with a SKOS encoding.
In addition, controlled vocabularies sometimes deprecate the preferred term. So cats might become feline or worst the concept might split and have references to two different preferred labels.
So the current situation for specifying categories in both RSS 2.0 and Atom is problematic, please don't make it worst.
> However, in SKOS you have a URI to the concept "cats", period. > You could map the domain to be the SKOS concept scheme's URI > and use the concepts label as the category content in Atom. But > concatenating the SKOS concept scheme URI and the concept label > doesn't necessarily produce the URI to the concept. For example, > in SKOS you might have the following:
> From: owner-atom-proto...@mail.imc.org > [mailto:owner-atom-proto...@mail.imc.org] On Behalf Of Henry Story > Sent: 01 November, 2006 13:01 > To: atom-owl@googlegroups.com > Cc: atom-protocol Protocol > Subject: Re: categories and tagging
> Ok so in AtomOwl, following my proposal [1] we could would > interpret the following atom
The problem is that "13745" isn't the "term", but most likely it is the concept's internal identifier in the controlled vocabulary. So while you can smush that part of the URI into the term attribute, it is not quite the same. Another problem is that you can have different URI's for a SKOS concept scheme and a concept. This is perfectly valid:
Assuming that my.scheme.net is a registry for vocabularies and my.concept.net is a repository for concepts.
Under your proposal you are making assumptions about the structure of a URI, which is something you are not suppose to do. Also, SKOS is URI based, which means you can use any URI scheme other that HTTP which may have different construction rules than just combining the scheme and term attributes with a slash and doing an HTTP GET on the result. As a matter of fact, the URI might not even be resolvable.
Thomas, I don't think that this is a natural reading of "term" in the atom syntax list.
[[ The "term" attribute is a string that identifies the category to which the entry or feed belongs. Category elements MUST have a "term" attribute. ]]
nowhere is there mentioned a IRI there, whereas just below
[[ The "scheme" attribute is an IRI that identifies a categorization scheme. Category elements MAY have a "scheme" attribute. ]] [2]
The scheme attribute is defined in terms of an IRI .
To give a bit more context to what Andrew was saying, he was arguing that for a mapping between the SKOS [3] vocabulary and the atom vocabulary. SKOS is indeed very interesting. It allows one to say something like
And this seems to be becoming quite a common way people are setting things up, and also it has some continuity with what the RSS2 folks were doing.
[[ <category> is an optional sub-element of <item>.
It has one optional attribute, domain, a string that identifies a categorization taxonomy.
The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories. Two examples are provided below: ]] [5]
It follows that what we have is something that can be expressed in RDF by saying that the :scheme and the :term relation form a CIFP [8], ie: together they uniquely identify one thing, and furthermore that the identity of the thing is given by the concatenation of those two strings.
This seems therefore to capture behavior that is not present in skos, but apart from that the two should be quite complimentary. Let us see how we can make them more so.
What we need perhaps is some way to make clear what the url of the category is. We could do this as follows:
1. add a new attribute to identify the category (lets call it catid) 2. Assume that if a catid is not present, and we have scheme and a term attribute, that the catid is formed by the concatenation of the scheme+term
Now because the "term" is mandatory in atom (and not the scheme), I suggest that one use the skos:prefLabel for it. I know there is a label too, but well, it certainly makes it easier to search for similar categories using the SPARQL type queries I put forward in [1].
> Would it not have been nice if we could have had a system whereby > scheme+term gives us a URI with which we can then identify the > category itself? What do people do now? Is it useful to assume that > this is the case when writing a client?
This is propably best communicated to the client as a collection feature, or?
> Thomas, I don't think that this is a natural reading of "term" in the > atom syntax list.
Andrew Houghton was talking about SKOS (which I don't know anything about) and said: [[ However, in SKOS you have a URI to the concept "cats", period. You could map the domain to be the SKOS concept scheme's URI and use the concepts label as the category content in Atom. But concatenating the SKOS concept scheme URI and the concept label doesn't necessarily produce the URI to the concept. For example, in SKOS you might have the following:
My answer is a bare mapping of this description into an atom:category element.
> [[ > The "term" attribute is a string that identifies the category to > which the entry or feed belongs. Category elements MUST have a "term" > attribute. > ]]
> nowhere is there mentioned a IRI there,
IRIs are not forbidden either, and Andrew's description makes me think the "concept URI" *is* the "term".
>> [[ >> The "term" attribute is a string that identifies the category to >> which the entry or feed belongs. Category elements MUST have a "term" >> attribute. >> ]]
>> nowhere is there mentioned a IRI there,
> IRIs are not forbidden either, and Andrew's description makes me think > the "concept URI" *is* the "term".
The question is: how does this help any of us? It may look like it is a "term", but what is a client meant to do with all this information?
What I am proposing is that we put forward some best practice to formalize a useful and RESTful way to publish this information, so that clients can use it. With APP we could do something like this: we could define for example that when entries are published and they contain categories that have a scheme that is accepted by the collection, then the entry will be found in the feed that is to be found either by appending scheme+term or in the catid location I mentioned previously.
and his collection manages the <http://www.tbray.org/ongoing/What/> scheme, as defined perhaps in the service document, (and perhaps we can place the list of available categories at that scheme location!) then his client will know that the entry will also be found in the <http://www.tbray.org/ongoing/What/Places> collection.
Now this would be useful for an APP publishing client, and it would be useful for an APP reader, because it could find some useful information at these various locations, and it would save us having to define an unending number of link relations that parallel the categories we have, when it is in fact clear that everybody intends to use scheme+term as a uri.
> >> [[ > >> The "term" attribute is a string that identifies the category to > >> which the entry or feed belongs. Category elements MUST have a "term" > >> attribute. > >> ]]
> >> nowhere is there mentioned a IRI there,
> > IRIs are not forbidden either, and Andrew's description makes me think > > the "concept URI" *is* the "term".
> The question is: how does this help any of us? It may look like it is > a "term", but what is a client meant to do with all this information?
Nothing. A client is not meant to do anything with atom:category elements other than for categorizing the entry or feed.
You can tell the reader that the entry is in the "Places" category, you can provide a "show other entries within this category" feature, you can group entries by their category (in a treeview: root nodes are the list of schemes, their child nodes are the list of terms, presented using the provided @label; if there are different @label used, you can default to the latest and provide a tooltip or other contextual info such as "a.k.a. Locations, Where"), etc.
> Since scheme is a URL I can presumably go there to find something. But what?
Some people also want to dereference XML Namespaces' URIs.
> Term is not defined to be a URI, and in the above example it is not, > and so why should I do anything with the term below?
There's no reason you would do anything with it either.
> What I am proposing is that we put forward some best practice to > formalize a useful and RESTful way to publish this information, so > that clients can use it. With APP we could do something like this: we > could define for example that when entries are published and they > contain categories that have a scheme that is accepted by the > collection, then the entry will be found in the feed that is to be > found either by appending scheme+term or in the catid location I > mentioned previously.
-1 But you can still do it yourself in your own implementation, eventually with the use of an f:feature to communicate the feature to clients.
> and his collection manages the <http://www.tbray.org/ongoing/What/> > scheme, as defined perhaps in the service document, (and perhaps we > can place the list of available categories at that scheme location!) > then his client will know that the entry will also be found in the > <http://www.tbray.org/ongoing/What/Places> collection.
I don't see how this is useful, but you might have good reasons.
> Now this would be useful for an APP publishing client, and it would > be useful for an APP reader, because it could find some useful > information at these various locations,
I understand the need to provide a "category URI" in some scenarios but that should be an extension to the atom:category element or a "mapping mechanism" communicated by a feed-level or entry-level extension, but please no "global assumption".
> and it would save us having > to define an unending number of link relations that parallel the > categories we have, when it is in fact clear that everybody intends > to use scheme+term as a uri.
Do you mean scheme+term, scheme+"/"+term or scheme+"#"+term? or maybe scheme+"/"+term+".atom"? or scheme+"/tags/"+term?
> From: owner-atom-proto...@mail.imc.org > [mailto:owner-atom-proto...@mail.imc.org] On Behalf Of Thomas Broyer
> Sent: 02 November, 2006 06:19
> To: Atom-Protocol; atom-owl@googlegroups.com; Atom-Syntax
> Subject: Re: categories and tagging
> 2006/11/2, Henry Story:
> > On 2 Nov 2006, at 08:59, Thomas Broyer wrote:
> > > [redirecting to atom-syntax]
> > This is also a protocol issue, because we are asking what > to do with > > the information in the atom feed. [1]
> Not sure how atom-protocol is concerned but let's keep it in > atom-protocol too...
It is both a feed issue and a protocol issue because of section 7 in
the APP draft. The APP draft puts forth category documents where I
think there are three concerns: 1) the specification of the category
element (overlap issue), 2) how could SKOS be used as a possible
alternative to category documents, 3) scalability issues with the specification of category documents.
1) Because category documents reuse the Atom syntax for the category
element, this becomes an overlap issue between Atom syntax and Atom
protocol.
2) It would be an ideal convergence between Atom and SKOS to be able to use APP to CRUD (Create, Read, Update, Delete) SKOS concept schemes and concepts in an APP server. Or for that matter, other controlled vocabulary XML grammars such as MARC-XML and Zthes. Providing a generalized solution would plug multiple communities into APP.
3) The current specification of the category document doesn't scale
since it appears to me that this is one document in the APP server.
Lets just say you are dealing with a folksonomy of tags from say
Flickr. There may be several thousand categories, if not several
hundred thousand. It is doubtful that an APP server could return
such a document without HTTP timing out the request. For controlled
vocabularies such as LCSH and MeSH there are about 300,000 and 500,000
categories respectively. Just create a simple category document and
copy and paste a single category element 500,000 times and look at the
size of the category document. It is just not going to be returned
by any HTTP server.
As a side note, it seems to me that APP could be used to access and maintain controlled vocabularies in SKOS, MARC-XML, Zthes, etc.
Basically APP provides collections of resources. There is a direct analogy in SKOS where a controlled vocabulary in SKOS contains a collection of skos:Concept resources.
If I understand these concepts correctly, then each skos:Concept could be stored within an app:collection. The same would be true for each skos:ConceptScheme associated with a controlled vocabulary.
The app:collection for both skos:Concept and skos:ConceptScheme would comprise an app:workspace. There could be multiple app:collection containing skos:Concept in an app:workspace.
To draw an analogy to the DDC (Dewey Decimal Classification, another
controlled vocabulary), each app:collection containing skos:Concept would represent the concepts found in a single edition of the DDC.
Each app:workspace represents a complete controlled vocabulary. This means that a single APP server, app:service, could access and maintain multiple controlled vocabularies. Given this mapping, it seems to me that APP could replace the SKOS API and may be a better alternative to library centric protocols such as ADL Thesaurus protocol, OpenURL, SRU
and SRU Record Update.
The current Atom category element is just another instance of an item
in an app:collection. Thus a category document could be used to point
to the individual concepts in an app:collection rather than embedding
the category elements inside the category document. Using this approach
any XML grammar, Atom, MARC-XML, SKOS, Zthes, etc., could be used.
> > Thomas, I don't think that this is a natural reading of > "term" in the > > atom syntax list.
I agree that this is not natural since term is not specified as a URI,
per my reading of the draft, and gets back to my point that you can smush stuff in these attributes, but that may not be the smartest thing to do in the long term.
BTW, generally I prefer not to cross post, this discussion is now
spread across three lists... please confine the discussion to the
Atom protocol list so people can get the complete thread rather than
bits and pieces.
>> and his collection manages the <http://www.tbray.org/ongoing/What/> >> scheme, as defined perhaps in the service document, (and perhaps we >> can place the list of available categories at that scheme location!) >> then his client will know that the entry will also be found in the >> <http://www.tbray.org/ongoing/What/Places> collection.
> I don't see how this is useful, but you might have good reasons.
Well for one one could use APP as a default way to post things to flickr, del.icio.us, and other tagging sites. The more people can use APP the better, no?
Each one of them is a URL and can be dereferenced, to find their meaning.
I find it odd, that a someone speaking from a group that prides itself on being RESTfull, does not want to get the added advantage that URLs provide. Especially in APP where we are concentrating so much on the HTTP protocol.
Yes. The information we have currently is useful. But if we get people to organise their web sites the way Tim Bray has, then we could write clients that use this information in much more interesting ways. And we would not need an extension for every obviously good way of doing things.
* Henry Story <henry.st...@bblfish.net> [2006-11-02 16:55]:
> The question is: how does this help any of us? It may look like > it is a "term", but what is a client meant to do with all this > information?
Simple: when the scheme and term of two different entries are identical, then you have confidence that they refer to the same concept. When the scheme URI is absent, the term is ambiguous.
That’s what scheme and term mean, and that’s all that they mean.
If you want to use a dereferencable protocol scheme for your category’s scheme URI, and want to run a service providing resources at the given URI, that’s fine, and more power to you. But nothing like that is mandated, much less is any approach for deriving a dereferencable URI for a single term.
* Jan Algermissen <algermissen1...@mac.com> [2006-11-02 16:55]:
> On Nov 1, 2006, at 4:22 PM, Henry Story wrote: >> Would it not have been nice if we could have had a system >> whereby scheme+term gives us a URI with which we can then >> identify the category itself? What do people do now? Is it >> useful to assume that this is the case when writing a client?
> This is propably best communicated to the client as > a collection feature, or?