Paul Mcginness
unread,Oct 27, 2010, 6:28:16 PM10/27/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ScotlandsData
I think in general all these datasets could be useful to third party
developers. I can imagine most of them being used to either create a
standalone application or as part of some aggregation of content
around a subject or context, such as hyperlocal sites.
If you were to review all of these on a case by case basis I think
there are a few dimensions you would use to organise them and
potentially prioritise the work with them. I think the ambition should
be not to just fulfil an obligation to publish these datasets but to
put thought into how you can get them syndicated as much as possible -
measure success by the number of data points viewed by end users (via
3rd party developers).
I would distinguish between those datasets which should be presented
only as a data-dump and those which should also be wrapped in a
meaningful package and exposed as an API.
This will depend on a number of factors, including how the data will
be used. For instance the traveline Scotland data is actually pretty
useful in a raw form but could be exposed as a routefinder API service
which would allow you to create some nice applications without having
to constantly download, process and host a large number of datasets.
In some cases offering an associated widget to consume the data could
be the best way of getting it syndicated.
The size, complexity and frequency of update also plays a factor in
whether you might prefer a data dump over a search and single record
download API. We are seeing this with our work with Experian where it
is more attractive to resellers of Experian data to have a simple API
than to have to host and manage large datasets with security
restrictions surrounding them.
Another important factor is whether the data is available in a better
form elsewhere. For instance it is going to be hard to beat the
routefinder provided by Google for driving directions (not just for
accuracy, but performance and stability), but when you want to look at
cycling routes, buses, trains and flights it might be a little more
patchy and the data from the traffic scotland site will be very
useful.
Finally I think that some of these datasets will take much more work
to sanitise than others. The SCAN database has some metadata but has a
large tranche of their content locked as text in image files. Although
you could offer these out as they are you will get much more utility
if the content were extracted and available in the data. This is a
practical decision of cost/benefits of doing the work and should be
based on demand as much as anything.
Thanks