Transport data

2 views
Skip to first unread message

Christopher Gutteridge

unread,
Nov 12, 2010, 6:17:49 AM11/12/10
to UK Government Data Developers
a few things about the transport dataset:

-reliability of the transport data-

Now I've looked into the transport data, it looks very useful to combine
with information about our university campus, but I can't find enough
information to be confident about it. I got as far as
http://transport.data.gov.uk/def/naptan but that tells me nothing.

To make the effort to link to this data I'd like to know if the URIs are
reliable or still being experimented on, does it have a policy on
corrections and frequency of updates and where can I get the summary of
classes used. My ideal solution is to just use the government assigned
URIs for nearby transport points, but only if they are reasonably
settled and maintained.

-Searching Talis-

I've only just figured out that the search on Talis datapoints is
returning something very useful, eg.
http://services.data.gov.uk/transport/search

When I tried it ages ago in browser, firefox rendered it as an RSS feed
which just said item,item,item,item and looked like a very lame list of
URIs.

I didn't think to look at the source, when I did yesterday, it turns out
to be dead useful! http://is.gd/gXUo0

Suggestion to Talis; add an rss:description to warn people that the data
won't show up in a conventional RSS reader.

-uk-specific-vocab-

As a side note; it's a pity that this is using a UK government specific
vocab. -- while naptanCode and nptgLocality are clearly UK specific
concepts, it would facilitate the interoperability if classes like
http://transport.data.gov.uk/def/naptan/OnStreetStopPoint were defined
in a separate schema to facilitate them being reused by other countries.
We benefit in shared tools if other countries adopt the same schema and
it is more likely to happen if the UKisms are seperate from the
useful-to-all bits.

By no means do I think that anything's been done badly, it's all ground
breaking and some things will only become apparent when we try to use
the data.

--
Christopher Gutteridge -- http://id.ecs.soton.ac.uk/person/1248

/ Lead Developer, EPrints Project, http://eprints.org/
/ Web Projects Manager, ECS, University of Southampton, http://www.ecs.soton.ac.uk/
/ Webmaster, Web Science Trust, http://www.webscience.org/

Roger Slevin

unread,
Nov 17, 2010, 8:02:38 AM11/17/10
to uk-government-...@googlegroups.com
Christopher Gutteridge wrote :

"As a side note; it's a pity that this is using a UK government specific
vocab. -- while naptanCode and nptgLocality are clearly UK specific
concepts, it would facilitate the interoperability if classes like
http://transport.data.gov.uk/def/naptan/OnStreetStopPoint were defined
in a separate schema to facilitate them being reused by other countries.
We benefit in shared tools if other countries adopt the same schema and
it is more likely to happen if the UKisms are seperate from the
useful-to-all bits."

NaPTAN and NPTG are defined in a UK schema which pre-dates any European
Standardisation in this area of transport data ... and these GB standards
have been significant inputs to the CEN IFOPT standard (Identification of
Fixed Points in Public Transport) which takes the conceptual basis of the UK
(and other) such datasets and has established a single standard for such
data -including extensions from what is held in NaPTAN to handle
accessibility and interchange information which are not currently held in
any national data within GB. NaPTAN and NPTG can be mapped to the IFOPT
standard - and a further standard now under development within CEN, NeTEx,
will provide mechanisms for the exchange of such data, along with schedules
data and fares data ultimately, between information systems anywhere. The
challenge with any such evolution of standards is to know when it is right
to upgrade the GB implementation - and given the significant investment in
information systems that depend on the current standards this is neither an
easy nor an inexpensive task. The business case is not there at present to
justify any major change yet in GB practice in this area - but development
work is taking place which may lead to an upgrade in standards at some time
in the future.

Roger

Jeni Tennison

unread,
Nov 19, 2010, 5:56:30 AM11/19/10
to uk-government-...@googlegroups.com
Hi Chris,

On 12 Nov 2010, at 11:17, Christopher Gutteridge wrote:
> -reliability of the transport data-
>
> Now I've looked into the transport data, it looks very useful to combine with information about our university campus, but I can't find enough information to be confident about it. I got as far as http://transport.data.gov.uk/def/naptan but that tells me nothing.
>
> To make the effort to link to this data I'd like to know if the URIs are reliable or still being experimented on, does it have a policy on corrections and frequency of updates and where can I get the summary of classes used. My ideal solution is to just use the government assigned URIs for nearby transport points, but only if they are reasonably settled and maintained.

The transport data that is available is based on traffic flow, NaPTAN and NPTG data that was converted statically by third parties (ie people within the data.gov.uk project rather than the owners of that information themselves). It is not 'live' or maintained in the way that obviously you and other people who want to use it would like. However, the URIs that the data uses won't be changed.

There is a real chicken-and-egg issue here. If no one uses the URIs or the data then it will be easy for the government to stop supporting them. If people start to use them, and build cool things on them, then the government will be more likely to support them properly.

The good thing is that data like the transport data doesn't change particularly rapidly (it's rare for stations to move or for new ones to get built) so the fact the data isn't live shouldn't be too much of an issue (though you should watch out for temporary bus stops).

The other thing to say is that even if the URIs no longer resolved, they are still useful identifiers that can be used within datasets maintained by third parties to help with joining data. It wouldn't be an ideal situation if they no longer resolved, but they would still have some use.

> -uk-specific-vocab-
>
> As a side note; it's a pity that this is using a UK government specific vocab. -- while naptanCode and nptgLocality are clearly UK specific concepts, it would facilitate the interoperability if classes like http://transport.data.gov.uk/def/naptan/OnStreetStopPoint were defined in a separate schema to facilitate them being reused by other countries. We benefit in shared tools if other countries adopt the same schema and it is more likely to happen if the UKisms are seperate from the useful-to-all bits.

There are a few things to say here.

There is a purposeful strategy at work here, which ties into what I see as one of the advantages of using RDF over other approaches. Basically, we want to show that it is *not* necessary to do up-front international standardisation in order to publish data that is reusable and mergeable with data from other countries. Nor is it necessary to perform up-front *national* standardisation of every single piece of data that is published across government or the public sector within the UK.

Instead, we can take an iterative, evolutionary approach. If you have some data that you want to publish, and you can't find an existing vocabulary to use to publish it then it's OK to make up your own and publish your data *now*. If other people with similar data do the same, eventually there will come a point where you and they want to get together to create something that you all use consistently, because that will bring benefits to both the people using your data (because their processing code will be simpler) and to yourselves (because you'll be able to share tools). What's more, this standardisation will be based on experience rather than theory.

The promise of RDF is that it will make this easier because the links between vocabularies can be articulated declaratively, and because we can use (whisper it) some reasoning to map from one vocabulary to another.

We will see how this plays through with INSPIRE, which of course is doing top-down standardisation of things like transport networks. The intention is to map those models into EU-wide vocabularies, then map the specific UK vocabularies onto those.

> By no means do I think that anything's been done badly, it's all ground breaking and some things will only become apparent when we try to use the data.

Quite right. Please do give us feedback about this data and how we can improve it.

Cheers,

Jeni
--
Jeni Tennison
http://www.jenitennison.com

BillRoberts

unread,
Nov 22, 2010, 12:15:02 PM11/22/10
to UK Government Data Developers
Does anyone know if the NAPTAN is indexed spatially? I've tried a
SPARQL query like this at http://services.data.gov.uk/transport/sparql

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select * where {?s <http://data.ordnancesurvey.co.uk/ontology/
spatialrelations/northing> ?n .
?s <http://data.ordnancesurvey.co.uk/ontology/
spatialrelations/easting> ?e .
FILTER (?n > "166700"^^xsd:integer &&
?n < "166800"^^xsd:integer &&
?e > "375800"^^xsd:integer &&
?e < "375900"^^xsd:integer)
} limit 10

to find all the transport access points within a particular
rectangular area, but the query times out. I can see that this is
quite hard work for the SPARQL endpoint to process. Are there any
features supported along the lines of http://code.google.com/p/geospatialweb/wiki/SpatialIndex
that could provide this sort of info without hammering the server too
badly?

Thanks

Bill


Jeni Tennison

unread,
Nov 23, 2010, 3:18:25 PM11/23/10
to uk-government-...@googlegroups.com
Bill,

One thing that I've found helps is to provide a scope using a locality or administrative area or something similar that narrows down the number of things that are potentially within the area. Unfortunately, the traffic flow data uses local authorities while NaPTAN uses NPTG transport areas, so you'd have to do a bit of fiddling to get it to work.

Jeni

--
Jeni Tennison
http://www.jenitennison.com

BillRoberts

unread,
Nov 23, 2010, 3:28:31 PM11/23/10
to UK Government Data Developers
Hi Jeni

Thanks - good idea. I'll try out some ideas along those lines.

Cheers

Bill



On Nov 23, 8:18 pm, Jeni Tennison <j...@jenitennison.com> wrote:
> Bill,
>
> One thing that I've found helps is to provide a scope using a locality or administrative area or something similar that narrows down the number of things that are potentially within the area. Unfortunately, the traffic flow data uses local authorities while NaPTAN uses NPTG transport areas, so you'd have to do a bit of fiddling to get it to work.
>
> Jeni
>
> On 22 Nov 2010, at 17:15, BillRoberts wrote:
>
>
>
> > Does anyone know if the NAPTAN is indexed spatially? I've tried a
> > SPARQL query like this athttp://services.data.gov.uk/transport/sparql
>
> > PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
>
> > select * where {?s <http://data.ordnancesurvey.co.uk/ontology/
> > spatialrelations/northing> ?n .
> >                ?s <http://data.ordnancesurvey.co.uk/ontology/
> > spatialrelations/easting> ?e .
> >                FILTER (?n > "166700"^^xsd:integer &&
> >                        ?n < "166800"^^xsd:integer &&
> >                        ?e > "375800"^^xsd:integer &&
> >                        ?e < "375900"^^xsd:integer)
> >                } limit 10
>
> > to find all the transport access points within a particular
> > rectangular area, but the query times out.  I can see that this is
> > quite hard work for the SPARQL endpoint to process.  Are there any
> > features supported along the lines ofhttp://code.google.com/p/geospatialweb/wiki/SpatialIndex

Alex Tucker

unread,
Nov 24, 2010, 5:37:32 AM11/24/10
to uk-government-...@googlegroups.com, BillRoberts
Hi Bill,

> Does anyone know if the NAPTAN is indexed spatially? I've tried a
> SPARQL query like this at http://services.data.gov.uk/transport/sparql

I'm pretty sure that neither the Talis nor TSO stores use spatial
indexes. One place to look is the Virtuoso backed LOD cloud which I
believe sucks up the NaPTAN triples, amongst everything else. As far as
I understand it, Virtuoso does some limited spatial indexing on WGS84
coordinates and has specific built in predicates to let you search for
things within a given radius of a lat/lon [1].

Note that the query you posted wouldn't actually benefit from spatial
indexing...

Alex.

[1]
http://www.openlinksw.com/dataspace/oerling/weblog/Orri%20Erling's%20Blog/1587
<http://www.openlinksw.com/dataspace/oerling/weblog/Orri%20Erling%27s%20Blog/1587>

Phil Archer

unread,
Nov 18, 2010, 5:24:21 PM11/18/10
to uk-government-...@googlegroups.com
Just a quick follow up on this - see inline below

On 12/11/2010 11:17, Christopher Gutteridge wrote:
> a few things about the transport dataset:

[..]

>
> -Searching Talis-
>
> I've only just figured out that the search on Talis datapoints is
> returning something very useful, eg.
> http://services.data.gov.uk/transport/search
>
> When I tried it ages ago in browser, firefox rendered it as an RSS feed
> which just said item,item,item,item and looked like a very lame list of
> URIs.
>
> I didn't think to look at the source, when I did yesterday, it turns out
> to be dead useful! http://is.gd/gXUo0
>
> Suggestion to Talis; add an rss:description to warn people that the data
> won't show up in a conventional RSS reader.
>

We've now raised this as a feature request in the relevant forum.
Further comment is welcome through

http://talisplatform.zendesk.com/entries/333706-add-rss-description-to-search-results

HTH

Phil
--


Phil Archer
Talis Platform
Web: http://www.talis.com
Twitter: philarcher1
LinkedIn: http://uk.linkedin.com/in/philarcher
Personal: http://philarcher.org


Talis Information Systems,
Knights Court
Solihull Parkway
Birmingham Business Park
B37 7YB
United Kingdom

Reply all
Reply to author
Forward
0 new messages