Couple of problems with data.gov.uk

14 views
Skip to first unread message

Al Sutton

unread,
Jan 23, 2010, 3:23:38 AM1/23/10
to UK Government Data Developers
First of congrats on the public launch. I know it's a big step getting
this far.

There are, however, a couple of big problems I can see. These are;

1) Where to find the sparql endpoints?

- Most developers will be looking for somewhere that tells them what
URL to post their queries to, but for all of the data sets I've looked
at on data.gov.uk the pages don't mention sparql let alone provide a
URL :(.

2) Lots of broken wiki pages giving 404s.

In actual fact I've yet to find one Wiki link from a data set page
which doesn't give a 404 error. Should these links be removed until
the wiki is working (Most wikis have a "Create this page" page for
missing entries).

If anyone can point out something I've missed and that I'm just being
plain dumb please let me know :).

Al.

Danny Ayers

unread,
Jan 23, 2010, 6:19:24 AM1/23/10
to uk-government-...@googlegroups.com

Jeni Tennison

unread,
Jan 23, 2010, 6:39:39 AM1/23/10
to uk-government-...@googlegroups.com
Al,

On 23 Jan 2010, at 08:23, Al Sutton wrote:
> 1) Where to find the sparql endpoints?
>
> - Most developers will be looking for somewhere that tells them what
> URL to post their queries to, but for all of the data sets I've looked
> at on data.gov.uk the pages don't mention sparql let alone provide a
> URL :(.

Very few datasets have been converted into linked data. The linked
data that exists is queryable via the SPARQL page. We have a long way
to go, and it's not going to happen over night. What we're focusing
on, at the moment, is providing easy recipes and toolsets to enable
departments to do that, and putting together some key URI sets (such
as geographies) that will help us to link the datasets together.

> 2) Lots of broken wiki pages giving 404s.
>
> In actual fact I've yet to find one Wiki link from a data set page
> which doesn't give a 404 error. Should these links be removed until
> the wiki is working (Most wikis have a "Create this page" page for
> missing entries).


The links through to the Wiki pages are there to encourage people to
start sharing information, comments, pointers to other data, and so
on. They are only populated to the extent that people have already
done that (which is hardly at all). If you think there should be some
information on a particular Wiki page, you should create it and add it!

Cheers,

Jeni
--
Jeni Tennison
http://www.jenitennison.com

William Waites

unread,
Jan 23, 2010, 10:59:56 AM1/23/10
to uk-government-...@googlegroups.com
Le 10-01-23 à 12:39, Jeni Tennison a écrit :

> Very few datasets have been converted into linked data.

Jeni, I mentioned some issues surrounding this in the post
I made to the list yesterday. There is indeed information
queryable through the sparql endpoints, but it is not linked
data.

What I mean by this is:

1. The URIs are not dereferenceable
2. There is no metadata in the sparql endpoints that makes
it possible to tell what information we are looking at.
3. There is little-to-no other documentation about what
datasets are available through the sparql interface, but
fix the other two and this comes along for free.

I think that until this has been fixed *no* datasets have
been converted into linked data. They may as well be chains
of bnodes.

Cheers,
-w
--
William Waites
Email: wwa...@gmail.com
UK tel: +44 131 516 3563
UK mob: +44 789 798 9965

Jeni Tennison

unread,
Jan 23, 2010, 1:31:47 PM1/23/10
to uk-government-...@googlegroups.com
William,

On 23 Jan 2010, at 15:59, William Waites wrote:
> Le 10-01-23 à 12:39, Jeni Tennison a écrit :
>> Very few datasets have been converted into linked data.
>
> Jeni, I mentioned some issues surrounding this in the post
> I made to the list yesterday. There is indeed information
> queryable through the sparql endpoints, but it is not linked
> data.
>
> What I mean by this is:
>
> 1. The URIs are not dereferenceable

We know that there are URIs that aren't dereferenceable. The majority
of these are ones that were minted within RDF that was put together
early on, while we were exploring the best ways of proceeding (and
were entirely dependent on volunteer effort).

There are also a lot of URIs that are dereferenceable, such as:

http://education.data.gov.uk/id/school/103335
http://statistics.data.gov.uk/id/local-authority/00QA

These are the ones that follow the guidelines that we have in place
for how URIs should be structured [1], because it was easy for us to
put in place some routing that would turn requests for those URIs into
relevant SPARQL queries. (That's not to say that these are perfect
either.)

> 2. There is no metadata in the sparql endpoints that makes
> it possible to tell what information we are looking at.

What kind of metadata would you like? We're planning voiD descriptions
of the linked data. Do you have other suggestions?

> 3. There is little-to-no other documentation about what
> datasets are available through the sparql interface, but
> fix the other two and this comes along for free.

Yes, that's one of the (huge number of things) we know we need to work
on. One way in which people can help with this is to create Wiki pages
for the datasets. There are a few that have already been created with
some basic information:

https://www.data.gov.uk/wiki/Education_%28dataset%29
https://www.data.gov.uk/wiki/Finance_%28dataset%29

> I think that until this has been fixed *no* datasets have
> been converted into linked data. They may as well be chains
> of bnodes.


I think it's innaccurate to say that no datasets have been converted
into linked data. Edubase and a bunch of ONS data is available like
this, and (trust me!) we're working on more. We simply lack hands. If
you (or anyone else) have time to give us some help, please drop me a
line.

Cheers,

Jeni

[1]: http://writetoreply.org/ukgovurisets/

William Waites

unread,
Jan 24, 2010, 4:28:46 AM1/24/10
to uk-government-...@googlegroups.com
Good morning Jeni and list, responses inline below.

Le 10-01-23 à 19:31, Jeni Tennison a écrit :

> We know that there are URIs that aren't dereferenceable.
> The majority of these are ones that were minted within
> RDF that was put together early on, while we were exploring
> the best ways of proceeding (and were entirely dependent
> on volunteer effort).

Ok, so these old URIs, are they inconvenient to return information
for given technical issues surrounding the way the back-end is put
together? If so, it's not to late to remove or change them, but
it will be soon. I don't think data.gov.uk can be held to pre-launch
draft implementations as far as URI permanence is concerned but
we're now post-launch.

> There are also a lot of URIs that are dereferenceable, such as:
>
> http://education.data.gov.uk/id/school/103335
> http://statistics.data.gov.uk/id/local-authority/00QA
>
> These are the ones that follow the guidelines that we have in place
> for how URIs should be structured [1], because it was easy for us to
> put in place some routing that would turn requests for those URIs
> into relevant SPARQL queries. (That's not to say that these are
> perfect either.)

I am working on similar things for CKAN/OKF with a working prototype
at http://semantic.ckan.net/ The way we handle this is to simply
do "SELECT ?s ?p ?i WHERE { GRAPH <$uri> { ?s ?p ?o } }" ( $uri is the
uri being dereferenced). This has the advantage of being trivial
to implement and the uri can be populated with arbitrary triples,
YMMV with different back ends.

>> 2. There is no metadata in the sparql endpoints that makes
>> it possible to tell what information we are looking at.
>
> What kind of metadata would you like? We're planning voiD descriptions
> of the linked data. Do you have other suggestions?

I would suggest starting with simple things like foaf:homepage
(pointing to the source), foaf:page (pointing to the registry)
and maye doap:maintainer or something pointing at contact people.

voiD is good, adding things like categories are good, etc., but
in terms of getting something workeable quickly adding contact
and source information is going to be a lot easier than modelling
the whole thing with voiD. It can be augmented on another pass.

I also wonder about the interaction between voiD and SCOVO. voiD
looks very good for metadata but not so good for data. Is it
coherent if redundant to do:

<something>
a scv:Dataset ;
a void:Dataset .

In any case, I'll look some more at voiD for CKAN. There's nothing
really substantial to share code-wise with the ckan instance underlying
data.gov.uk, just that there should be alternate links and content
negotiation in the ckan part that do redirects to the triplestore
front-end if necessary - this should be easy to do for data.gov.uk as
well. Incidentally I don't see any mention of API keys for the JSON
api to the registry, did data.gov.uk disable that or is it just
hidden?

> https://www.data.gov.uk/wiki/Education_%28dataset%29

A graph at http://education.data.gov.uk/schools with a (perhaps
paginated and searchable) list of schools would be very helpful
for the public to find information about their children's school.
Similarly for any other type. This would go a long way, I think
to make the information accessible.

> https://www.data.gov.uk/wiki/Finance_%28dataset%29

I think this dataset should be re-done from scratch.

> I think it's innaccurate to say that no datasets have been converted
> into linked data. Edubase and a bunch of ONS data is available like
> this, and (trust me!) we're working on more. We simply lack hands.
> If you (or anyone else) have time to give us some help, please drop
> me a line.


I was engaging in a bit of hyperbole to make a point, and also the
datasets I encountered in my random sampling were pretty bad.

I've done some work already on processing the main dataset underlying
WDMMG, http://semantic.ckan.net/data/ukgov-finances-cra though I may
redo some of it in light of the voiD recommendations. Some if not
all of it might be good to migrate into an official data.gov.uk
namespace, particularly the departments and spending categories.

Kingsley Idehen

unread,
Jan 24, 2010, 2:10:55 PM1/24/10
to uk-government-...@googlegroups.com
William Waites wrote:
> Le 10-01-23 � 12:39, Jeni Tennison a �crit :

>
>> Very few datasets have been converted into linked data.
>
> Jeni, I mentioned some issues surrounding this in the post
> I made to the list yesterday. There is indeed information
> queryable through the sparql endpoints, but it is not linked
> data.
>
> What I mean by this is:
>
> 1. The URIs are not dereferenceable
> 2. There is no metadata in the sparql endpoints that makes
> it possible to tell what information we are looking at.
> 3. There is little-to-no other documentation about what
> datasets are available through the sparql interface, but
> fix the other two and this comes along for free.
>
> I think that until this has been fixed *no* datasets have
> been converted into linked data. They may as well be chains
> of bnodes.
>
> Cheers,
> -w
I think the single most important thing right now remains: not
conflating data in an RDF Store with Linked Data, they aren't the same
thing. The less we confuse the better for everyone.

As I've stated in the past, the conversation doesn't have to start with
SPARQL. Just as RDBMS conversations don't start with SQL. People (users
or developers) like to see something and then be compelled to drill down
to its constituent parts.

All things RDF already have major image challenges due to lots of
historic issues in the message mangling area. Lets learn from the past,
so that the real virtues of what's actually taking place re. govt data
(i.e., handing the Public Data back to the Public that actually own it)
shines through, unencumbered.

--

Regards,

Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter: kidehen

Kingsley Idehen

unread,
Jan 24, 2010, 2:13:14 PM1/24/10
to uk-government-...@googlegroups.com
Jeni Tennison wrote:
> William,
>
> On 23 Jan 2010, at 15:59, William Waites wrote:
>> Le 10-01-23 � 12:39, Jeni Tennison a �crit :

Jeni,

This might just be a case of being clearer via some basic HTML or Wiki
page that distinguishes Open Data (which includes RDF datasets without
de-referencable URIs) from Linked Open Data re. UK Govt. Data Spaces.


--

Regards,

Kingsley Idehen

unread,
Jan 24, 2010, 2:16:31 PM1/24/10
to uk-government-...@googlegroups.com
William Waites wrote:
> Good morning Jeni and list, responses inline below.
>
> Le 10-01-23 � 19:31, Jeni Tennison a �crit :
Nice start :-)

Note: DBpedia provides you with a very nice template.

Links:

1. http://dbpedia.org
2. http://wiki.dbpedia.org/Datasets
3. http://wiki.dbpedia.org/OnlineAccess
4. http://wiki.dbpedia.org/Downloads34
5. http://wiki.dbpedia.org/Interlinking
6. etc..

Jeni Tennison

unread,
Jan 24, 2010, 2:25:57 PM1/24/10
to uk-government-...@googlegroups.com
Hi William,

On 24 Jan 2010, at 09:28, William Waites wrote:
> Le 10-01-23 à 19:31, Jeni Tennison a écrit :
>> We know that there are URIs that aren't dereferenceable.
>> The majority of these are ones that were minted within
>> RDF that was put together early on, while we were exploring
>> the best ways of proceeding (and were entirely dependent
>> on volunteer effort).
>
> Ok, so these old URIs, are they inconvenient to return information
> for given technical issues surrounding the way the back-end is put
> together? If so, it's not to late to remove or change them, but
> it will be soon. I don't think data.gov.uk can be held to pre-launch
> draft implementations as far as URI permanence is concerned but
> we're now post-launch.

To be fair, data.gov.uk is still in beta, and it's going to be the
case generally (because of the distributed and gradual way in which
linked data is going to be introduced into government) that there will
be URI sets and datasets that are in a trial state that should not
(yet) be relied upon. We can't expect everyone to get things right
first time, and if we make that an expectation, we're likely to see
government retreating back from the more open and agile process that I
think most people would like to see.

What *is* needed is the ability to label such sets of URIs as, say,
work in progress rather than finalised. This is part of the general
metadata of URI sets that I think we need to make available.

> I also wonder about the interaction between voiD and SCOVO. voiD
> looks very good for metadata but not so good for data. Is it
> coherent if redundant to do:
>
> <something>
> a scv:Dataset ;
> a void:Dataset .

My understanding is that a voiD:Dataset is a set of linked data (not
any old dataset), whereas scv:Dataset is a set of statistical
observations. I'd expect some scv:Datasets to be void:Datasets, but
not all of them.

> In any case, I'll look some more at voiD for CKAN. There's nothing
> really substantial to share code-wise with the ckan instance
> underlying
> data.gov.uk, just that there should be alternate links and content
> negotiation in the ckan part that do redirects to the triplestore
> front-end if necessary - this should be easy to do for data.gov.uk as
> well. Incidentally I don't see any mention of API keys for the JSON
> api to the registry, did data.gov.uk disable that or is it just
> hidden?

I'm not sure how the data.gov.uk website is accessing the CKAN registry.

>> https://www.data.gov.uk/wiki/Education_%28dataset%29
>
> A graph at http://education.data.gov.uk/schools with a (perhaps
> paginated and searchable) list of schools would be very helpful
> for the public to find information about their children's school.
> Similarly for any other type. This would go a long way, I think
> to make the information accessible.

I agree. Speaking personally, it's my highest priority to provide this
kind of API-style access to the linked data that we have, so that
ordinary developers don't have to be concerned with creating SPARQL
queries to get useful information.

>> https://www.data.gov.uk/wiki/Finance_%28dataset%29
>
> I think this dataset should be re-done from scratch.

I think most of them need to be redone.

>> I think it's innaccurate to say that no datasets have been converted
>> into linked data. Edubase and a bunch of ONS data is available like
>> this, and (trust me!) we're working on more. We simply lack hands.
>> If you (or anyone else) have time to give us some help, please drop
>> me a line.
>
> I was engaging in a bit of hyperbole to make a point, and also the
> datasets I encountered in my random sampling were pretty bad.
>
> I've done some work already on processing the main dataset underlying
> WDMMG, http://semantic.ckan.net/data/ukgov-finances-cra though I may
> redo some of it in light of the voiD recommendations. Some if not
> all of it might be good to migrate into an official data.gov.uk
> namespace, particularly the departments and spending categories.


That sounds very interesting.

I'm sure that it must be frustrating to see us making mistakes and not
moving as quickly as you'd like. There is a lot going on behind the
scenes, of course. That's why I'd like to encourage you (and others)
to get in touch if you have any spare capacity, so that we can
coordinate our efforts and get more done, faster.

Cheers,

Jeni

William Waites

unread,
Jan 27, 2010, 9:26:29 AM1/27/10
to uk-government-...@googlegroups.com
Le 10-01-24 à 19:25, Jeni Tennison a écrit :

> I'm not sure how the data.gov.uk website is accessing the CKAN
> registry.

As I understand it they have their own instance of the CKAN software,
so they are not accessing ckan.net itself. I am unsure of the extent
to which they have customised the software.

That said, one of the easier things that could be done is taking that
(meta)data and turning it into RDF. The ckan software does have an API
for registering and querying datasets but I don't think the data.gov.uk
has that exposed. If you can get access to it through internal channels,
it wouldn't be hard to run a cron job that turns the registry into RDF
and shoves it into the triplestore.

A longer term project is to look at making CKAN itself run out of
an RDF back-end instead of a SQL database. Would make it easier to
add extra fields and other descriptive bits. Versioning and provenance
would have to be worked out. I'm thinking something like the changeset
vocabulary might be useful for this. I gather from your blog that you
have gone some distance down that road with a genealogical website - how
did that turn out?

Martin Waller

unread,
Jan 28, 2010, 4:05:58 PM1/28/10
to UK Government Data Developers
Hello,

I totally agree with this posting. In my mind I think that the whole
thing has gone off half cocked. Perhaps my expectations were too high
but I was wanting to find access to live data - as live as it could
possibly be but I accept that government statistics are going to
be ... delayed shall we say - I was wanting to find code examples of
how data could be extracted on the fly from the data store. In reality
I found a web site that was difficult to dig into and a forum that I
needed a username and password for and no where to obtain a username
and password. I only found out that this Google Group existed by
accident too!

I went to have a look at the US version http://data.gov and I found
the site easier to use and I already have a username / password for
the forums.

I have seen a few examples and I've looked at the applications
available and I'd bet that most of them have been written around
static data extracted and cleaned up from sites references by http://data.gov.uk
and are not taking 'live' data directly.

I don't think you're being plain dumb Al, I think that's the way it
is.

Martin

http://www.OrwellAIS.com

Ed Summers

unread,
Jan 28, 2010, 6:15:44 PM1/28/10
to uk-government-...@googlegroups.com
I think you should all be really impressed with what you have
achieved. I'm on the other side of the pond, and see a whole lot in
the approach you have taken that I think is somewhat lacking at
data.gov currently.

Specifically I'm thinking of the dataset descriptions being made
available for both people and machines using RDFa. I think it's an
elegant demonstration of simple things that can be done (adjusting
some html templates?) to make these datasets available as Linked
Data...by leveraging information already found in data stores.

I realize you all probably have your hands full at the moment...but a
few suggestions/ideas from an outsider:

- layer some rdfa into the tag pages (e.g. Alcohol [1]) using the skos
[2], commontag [3] or some other vocabulary. This would allow the
topics to be labeled for the machines, and would encourage the use of
the vocabulary inside and outside the data.gov.uk site.

- mint URIs for the creators of datasets, like "North West Public
Health Authority (NWPHO)" [4] and use them in the dcterms:creator
statements. Having authoritative URIs for these organizations could be
valuable in data created elsewhere on the web. It would also be a
handy place to hang datasets from a particular organization, and link
outwards onto the gov.uk web.

- link the dataset descriptions to the dataset resources themselves
(pdf, csv, xls files) using the oai-ore [5] or powder [6]
vocabularies. This would allow the datasets to be harvested in
addition to the descriptions. I imagine it would be some work to
figure out the URIs for the dataset files, but not a lot compared to
the challenge of converting the datasets themselves into RDF.

- augment the syndicated feed [7] with the metadata that is known on
the dataset description pages, and include links to dataset files if
possible. It would also be good if you could page through all the
entire set of datasets using link rel="next" ... this would make it
relatively easy for someone to keep a synchronized view of what
datasets are available.

Anyhow, keep up the good work!

//Ed

[1] http://data.gov.uk/data/tag/alcohol
[2] http://www.w3.org/TR/skos-reference/
[3] http://www.commontag.org/Specification
[4] https://www.data.gov.uk/dataset/alcohol_profile_-_alcohol-attributable_hospital_admission_males_2005-06
[5] http://www.openarchives.org/ore/1.0/vocabulary
[6] http://www.w3.org/TR/powder-dr/
[7] http://www.data.gov.uk/rss.xml

Mark Birbeck

unread,
Jan 29, 2010, 5:29:31 AM1/29/10
to uk-government-...@googlegroups.com
Hi Ed,

> I think you should all be really impressed with what you have
> achieved. I'm on the other side of the pond, and see a whole lot in
> the approach you have taken that I think is somewhat lacking at
> data.gov currently.
>
> Specifically I'm thinking of the dataset descriptions being made
> available for both people and machines using RDFa.  I think it's an
> elegant demonstration of simple things that can be done (adjusting
> some html templates?) to make these datasets available as Linked
> Data...by leveraging information already found in data stores.

You are right that this is a nice feature.

But if we're going to pass praise around, it's only fair to
acknowledge that it was the Australian data gov site that did this
first. :)

Last October they launched data.australia.gov.au, which included RDFa
for each dataset. For example:

<http://backplanejs.appspot.com/rdfa?url=http://data.australia.gov.au/602>

There are some aspects of the RDFa that need fixing (as there are with
the UK one), but I particularly like the presence of triples that
indicate the type of the data that is linked to. In the example just
referenced, the data is indicated to be in a spreadsheet:

<http://www.australiacouncil.gov.au/__data/...UPLOAD.xls>
<http://purl.org/dc/elements/1.1/format> "XLS" .

Regards,

Mark

--
Mark Birbeck, webBackplane

mark.b...@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Ed Summers

unread,
Jan 29, 2010, 9:57:37 AM1/29/10
to uk-government-...@googlegroups.com
On Fri, Jan 29, 2010 at 5:29 AM, Mark Birbeck
<mark.b...@webbackplane.com> wrote:
> Last October they launched data.australia.gov.au, which included RDFa
> for each dataset.

Wow, I had no idea--thanks Mark.

I just supplemented my post about data.gov.uk with another one [1]
about data.australia.gov.au. It makes me wonder if there might be an
opportunity here to encourage some common vocabulary use to make
aggregation and other services possible. I'm not sure what the right
venue for that is, perhaps the w3c egov working group? Maybe they've
already tackled it?

//Ed

[1] http://inkdroid.org/journal/2010/01/29/data-australia-gov-au-and-rdfa/

Reply all
Reply to author
Forward
0 new messages