CKAN metadata in RDF

138 views
Skip to first unread message

David Read

unread,
Oct 15, 2009, 8:09:03 AM10/15/09
to uk-government-...@googlegroups.com
Hi,

As many of you know, CKAN is the backend storing the metadata for
packages/datasets on hmg.gov.uk/data. One of our aims has been to
publish the CKAN metadata in RDF format (followed by posting it to
Talis CC).

Thanks to assistance from Leigh Dodds, we've started work on this --
with a sample RDF representation of a CKAN package below.

Any suggestions or comments are very welcome. In particular, any
suggestions for specific ontologies to use for describing
datasets/collections of data.

David Read
Open Knowledge Foundation

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:ckan="http://ckan.net/ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://ckan.net/package/rdf/usa-courts-gov">
<rdf:type rdf:resource="http://ckan.net/ns#Package"/>
<foaf:isPrimaryTopicOf>http://ckan.net/package/usa-courts-gov</foaf:isPrimaryTopicOf>
<dc:title>Text of US Federal Cases</dc:title>
<dc:description rdf:parseType="Literal">This is an archive of US
court videos</dc:description>
<foaf:homepage>http://public.resource.org/</foaf:homepage>
<ckan:downloadUrl>http://bulk.resource.org/courts.gov/</ckan:downloadUrl>
<sioc:has_creator>US Courts</sioc:has_creator>
<dc:contributor>Public.Resource.Org</dc:contributor>
<dc:rights>OKD Compliant::Open Data Commons Open Database License
(ODbL)</dc:rights>
<dc:subject>us, courts, case-law, us, courts, case-law, gov,
legal, law, access-bulk</dc:subject>
</rdf:Description>
</rdf:RDF>

Steve Harris

unread,
Oct 15, 2009, 8:17:01 AM10/15/09
to uk-government-...@googlegroups.com

It's perhaps more convenient to list the subjects separately, eg

<dc:subject>us</dc:subject>
<dc:subject>courts</dc:subject>
<dc:subject>case-law</dc:subject>
...

Many RDF stores have ways to search text, but it's not standardised by
SPARQL, and it's more obvious how to query it if they're separate.

- Steve

Andy Powell

unread,
Oct 15, 2009, 8:55:24 AM10/15/09
to uk-government-...@googlegroups.com
Couple of suggestions...

- add sioc namespace declaration
- use DC Terms namespace rather than the older DC 1.1 namespace (unless you had a specific reason for wanting the older definitions?)
- make more use of rdf:resource
- separate out DC subject terms - more verbose but also more explicit

...giving you something like this:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF

xmlns:dc="http://purl.org/dc/terms/"

xmlns:sioc="http://rdfs.org/sioc/ns#"

<foaf:isPrimaryTopicOf rdf:resource="http://ckan.net/package/usa-courts-gov" />


<dc:title>Text of US Federal Cases</dc:title>
<dc:description rdf:parseType="Literal">This is an archive of US court videos</dc:description>

<foaf:homepage rdf:resource="http://public.resource.org/" />
<ckan:downloadUrl rdf:resource="http://bulk.resource.org/courts.gov/" />


<sioc:has_creator>US Courts</sioc:has_creator>
<dc:contributor>Public.Resource.Org</dc:contributor>

<dc:rights rdf:resource="http://opendatacommons.org/licenses/odbl/1.0/" />


<dc:subject>us</dc:subject>
<dc:subject>courts</dc:subject>
<dc:subject>case-law</dc:subject>

<dc:subject>gov</dc:subject>
<dc:subject>legal</dc:subject>
<dc:subject>law</dc:subject>
<dc:subject>access-bulk</dc:subject>
</rdf:Description>
</rdf:RDF>

I'm not sure why you have only used rdf:parseType on dc:description ?

Andy

________________________________

Andy Powell
Research Programme Director
Eduserv

andy....@eduserv.org.uk
01225 474319 / 07989 476710
www.eduserv.org.uk
efoundations.typepad.com
twitter.com/andypowe11

Martyn Bedford

unread,
Oct 15, 2009, 9:22:18 AM10/15/09
to uk-government-...@googlegroups.com

Regards
Martyn

Keith Alexander

unread,
Oct 15, 2009, 9:26:22 AM10/15/09
to uk-government-...@googlegroups.com
Hi,


On Thu, Oct 15, 2009 at 1:55 PM, Andy Powell <andy....@eduserv.org.uk> wrote:
>
> Couple of suggestions...
>
> - add sioc namespace declaration

If this is to allow sioc:has_creator, then I'm not sure it has the
intended semantics - sioc:has_creator has a range of
http://rdfs.org/sioc/ns#User

I would suggest, either using http://purl.org/dc/elements/1.1/creator,
or if you can find or mint URIs to use instead of literal values,
http://purl.org/dc/terms/creator
The older dc namespace permits allows a literal value for creator, the
newer one (terms) says you should point to a resource.

> - use DC Terms namespace rather than the older DC 1.1 namespace (unless you had a specific reason for wanting the older definitions?)

the definition of dcterms:subject is different from dc:subject, and says:
"This term is intended to be used with non-literal values as defined
in the DCMI Abstract Model
(http://dublincore.org/documents/abstract-model/)."

So if you are continuing to use literal values, you should continue to
use dc/elements/1.1/ instead of dc/terms
(alternatively, find or mint appropriate uris to use)

> - make more use of rdf:resource

Yes, eg, foaf:isPrimaryTopicOf should have a resource value, not a literal.

> - separate out DC subject terms - more verbose but also more explicit

+1

parseType="Literal" is only necessary if the predicate element
contains XML that shouldn't be parsed as RDF/XML


Keith Alexander

Kingsley Idehen

unread,
Oct 15, 2009, 9:44:39 AM10/15/09
to uk-government-...@googlegroups.com
David,

Any reason why you don't have an HTML + RDFa representation of the
above? It would simply be part of your existing HTML pages.

--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com


David Read

unread,
Oct 15, 2009, 10:19:06 AM10/15/09
to uk-government-...@googlegroups.com
Kingsley,

Putting RDFa in the HTML is an excellent suggestion and we've got it
on the todo list!

Dave

David Read

unread,
Oct 15, 2009, 10:21:34 AM10/15/09
to uk-government-...@googlegroups.com
Steve, Andy and Keith, many thanks for your suggestions - these are
most valuable and will help us to improve CKAN's usefulness.

Any more ideas from anyone, please do say.

Cheers,
David

Andy Powell

unread,
Oct 15, 2009, 10:39:42 AM10/15/09
to uk-government-...@googlegroups.com
> If this is to allow sioc:has_creator, then I'm not sure it has the
> intended semantics - sioc:has_creator has a range of
> http://rdfs.org/sioc/ns#User
>
> I would suggest, either using http://purl.org/dc/elements/1.1/creator,
> or if you can find or mint URIs to use instead of literal values,
> http://purl.org/dc/terms/creator
> The older dc namespace permits allows a literal value for creator, the
> newer one (terms) says you should point to a resource.
>
> > - use DC Terms namespace rather than the older DC 1.1 namespace
> (unless you had a specific reason for wanting the older definitions?)
>
> the definition of dcterms:subject is different from dc:subject, and
> says:
> "This term is intended to be used with non-literal values as defined
> in the DCMI Abstract Model
> (http://dublincore.org/documents/abstract-model/)."
>
> So if you are continuing to use literal values, you should continue to
> use dc/elements/1.1/ instead of dc/terms
> (alternatively, find or mint appropriate uris to use)

Keith,
Yes, sorry. You are right. I should know my own documents better than that but I haven't looked at them enough recently! :-( Apologies.

So, I think there are some options here, and I think the issues around those options are worth thinking about because they are likely to come up in other areas.

Looking at Expressing Dublin Core metadata using the Resource Description Framework (RDF) - http://dublincore.org/documents/dc-rdf/ - the pattern of use that DCMI suggests for terms like dcterms:subject (in the newer DCMI namespace) in cases where you only have a literal value, is to introduce a blank node with an rdf:value hanging off it, i.e. to do something like

<dcterms:subject>
<rdf:Description>
<rdf:value>law</rdf:value>
</rdf:Description>
</dcterms:subject>

In the context of the discussion here, that would apply to the usage of dcterms:subject, dcterms:contributor and dcterms:creator (if it is decided to use dcterms:creator rather than sioc:has_creator).

This is clearly more verbose than the proposed use of dc:subject, dc:contributor and dc:creator but it has the advantage of encouraging people towards the use of resource values rather than literal values - which is a good thing. I don't know how well the introduction of a blank node fits with common Linked Data practice currently?

I'm assuming that we are currently in a mixed environment where there is some use of literal values for subjects and some use of resource values - but that we want to encourage more use of resource values?

I'm also assuming that we want to encourage a single pattern of usage (whether based on DC properties or something else).

If my two assumptions are correct, then I think we are better off moving to the use of the dcterms namespace now, and living with the necessary 'evil' of blank nodes in the short term, because that will leave us in a better place in the longer term.

The downside of course is that it results in the following RDF:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF

xmlns:dcterms="http://purl.org/dc/terms/"


xmlns:ckan="http://ckan.net/ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://ckan.net/package/rdf/usa-courts-gov">
<rdf:type rdf:resource="http://ckan.net/ns#Package"/>
<foaf:isPrimaryTopicOf rdf:resource="http://ckan.net/package/usa-courts-gov" />

<dcterms:title>Text of US Federal Cases</dcterms:title>
<dcterms:description>This is an archive of US court videos</dcterms:description>


<foaf:homepage rdf:resource="http://public.resource.org/" />
<ckan:downloadUrl rdf:resource="http://bulk.resource.org/courts.gov/" />

<dcterms:creator>
<dcterms:Agent><rdf:value>US Courts</rdf:value></dcterms:Agent>
</dcterms:creator>
<dcterms:contributor>
<dcterms:Agent><rdf:value>Public.Resource.Org</rdf:value></dcterms:Agent>
</dcterms:contributor>
<dcterms:rights rdf:resource="http://opendatacommons.org/licenses/odbl/1.0/" />
<dcterms:subject>
<rdf:Description><rdf:value>us</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>courts</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>case-law</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>gov</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>legal</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>law</rdf:value></rdf:Description>
</dcterms:subject>
<dcterms:subject>
<rdf:Description><rdf:value>access-bulk</rdf:value></rdf:Description>
</dcterms:subject>
</rdf:Description>
</rdf:RDF>

Which is significantly more complex than that originally proposed.

The alternatives are to either opt to use only the dc namespace (which runs somewhat counter to the general direction of DCMI) or to use both dc and dcterms as appropriate (but that would be to encourage two patterns of use). Neither of these alternatives seem right to me.

I'd be interested in people's views on this. If nothing else it would provide useful feedback to DCMI about the kind of patterns of RDF usage they are trying to encourage.

Andy

________________________________________

Kingsley Idehen

unread,
Oct 15, 2009, 10:57:13 AM10/15/09
to uk-government-...@googlegroups.com
David,

Don't forget, within <head/>, to use <link/>'s @rel attribute to point
to other metadata representations e.g. your existing RDF/XML
representation. For instance, if you lookup
<http://dbpedia.org/resource/London> and the view source of the
HTML+RDFa representation of the metadata for this entity, you will see:
<link rel="alternate" type="application/rdf+xml"
href="http://dbpedia.org/data/London.rdf" title="RDF/XML Representation" />
<link rel="alternate" type="text/rdf+n3"
href="http://dbpedia.org/data/London.n3" title="RDF N3/Turtle
Representation" />

Reply all
Reply to author
Forward
0 new messages