TfL timetable data released in London Datastore

278 views
Skip to first unread message

NoamB

unread,
Sep 3, 2010, 10:38:38 AM9/3/10
to Transit Developers
Great New!
Yesterday TfL (Transport for London) released all tietable data for
buses, tube and ferry in Greate London, into the London Datastore
http://data.london.gov.uk/datastore/package/tfl-timetable-listings

The data is in ~800 xml files, all in TransXchange, and it seems the
only reference they make are to the StopPointRef in NaPTAN dataset
(available through data.gov.uk). I haven't checked the license yet, so
don't know whether it's differnet from the rest of the datasets
released by tfL to the London datastore.

I tried to convert the data to GTFS, and failed :( I was hoping
someone wiht more experience than I in TransXChange2GTFS might help
(Joa?)


Cheers
Noam

Joa

unread,
Sep 4, 2010, 12:59:03 PM9/4/10
to Transit Developers
Wow, I poked around London Datastore a few weeks ago in hopes of
finding this. Now that it's out, I didn't lose much time this morning
to try this out. The following steps create a GTFS feed from the
TransXChange file set:

1. Download the TransXChange file set; it is a zip archive. No need to
unpack it.
http://data.london.gov.uk/datastore/package/tfl-timetable-listings
T&C's apply.

2. Grab the latest stop set from NaPTAN.
2.1 It is hosted here:
http://www.dft.gov.uk/public-transportdatamanagement/DataUser_Login.aspx
Registration required. T&C's apply.
2.2 Navigate to download:
NaPTAN Reference Data
Preferences
- Data Required: All Areas
- Data Format: CSV ZIP
- Data Version: Version 1
2.3 Extract the Stops.csv file from the zip archive.

3. Convert
3.1 Use the latest version of the TransXChange2GoogleTransit (1.6.5).
Download here:
http://code.google.com/p/googletransitdatafeed/downloads/list
Released under Apache 2.0 License
3.2 In steps 1 and 2.3, I've downloaded the relevant data into
directory TfL. The following command references this directory:
$ ./tXCh2GT.bat TfL/tfl-timetable-listing.zip http://www.london.gov.uk
Europe/London 3 TfL/GTFS TfL/Stops.csv
Converting the TransXChange archive should take anywhere between 15min
and an hour. Run on a 2GHz/2GB notebook PC, it took about 45mins.

4. Discussion
The resulting GTFS archive of course is huge. I wasn't surprised to
find stop_times.txt at around 900MB. The "kicker", at least at this
stage, are the calendar exceptions (exception_type == 2) in
calendar_dates.txt. This file arrives at a scary 6GB. A first analysis
shows that the originating TransXChange file set contains services
that are marked as <DaysOfNonOperation> for an entire year. Services
starting with 76167 are an example. Although it appears technically
correct, this leads to an "excessive" number of calendar exceptions
across such services. Going forward, I see two alternatives:
4.1 The originator of the data to verify if this "blanket blackout" of
services across an entire year is correct. To me, it appears as "this
can't be right"
4.2 If the feed indeed is provided as intended, I'll see if I can
modify the converter to drop such services altogether. This would also
help with the size of trips and stop_times

5. Feed validation
With the above findings, I was concerned whether the GTFS feed
validator would be able to process the GTFS feed, and indeed, it
couldn't, at least on the box with the above specs.

6. Conclusion
If we (as in developer community) get lucky we might be able to engage
TfL in a dialog about the problems that the TransXChange file set may
or may not have. I will give this a few days to see if we can get some
guidance. Outside of that, I'll also see if I can take some action to
navigate around the "blanket blackout" problem (see 4.2).

JP


On Sep 3, 7:38 am, NoamB <noa...@gmail.com> wrote:
> Great New!
> Yesterday TfL (Transport for London) released all tietable data for
> buses, tube and ferry in Greate London, into the London Datastorehttp://data.london.gov.uk/datastore/package/tfl-timetable-listings

HealsJnr

unread,
Sep 7, 2010, 3:13:49 AM9/7/10
to Transit Developers
Hi Joa,

Thanks for the informative post. I've run the converter and generated
the output as expected (although i did have to set the -xmx flag to
about 6GB).

Once the data was processed we ran it through our own GTFS processing
step which failed reading the Calendar_dates.txt. Initially i though
this was due to the sheer size of the file, on further investigation
it appears the file itself contained some invalid records. Below is an
extract from the Calendar_dates.txt (I've added line numbers):

1127, SId_76167m3,20101230,2
1128, SId_76167m3,20101231,2
1129, SId_76167m3,null,1
1130, SId_76167m3,null,1
1131, SId_76167m0_76167mm0H1670138@06:31:00,20100101,2
1132, SId_76167m0_76167mm0H1670138@06:31:00,20100102,2

Looks like this is happening for about 1300 or so service ids. The
output tXCh2GT output also produces a number of errors in the
stops.txt. There are 55 stops that have the latitude and longitude
each set to "OpenRequired" (which is slightly unexpected??). Below is
an exert from our log:

Exception parsing stop record 970.
490000238006,Tower Hill (London), Tower
Hill,,OpenRequired,OpenRequired,,,
Exception parsing stop record 1045.
9400ZZLUDGY1,Dagenham, Dagenham Heathway,,OpenRequired,OpenRequired,,,
Exception parsing stop record 1212.
490000276005,Woodside Park, Woodside
Park,,OpenRequired,OpenRequired,,,
Exception parsing stop record 1327.
490000254009,Waterloo (London), Waterloo,,OpenRequired,OpenRequired,,,

Not sure if these issues are due to the original TransXchange data or
due to the processing, but I thought you might be interested.

Cheers,

David.

On Sep 5, 2:59 am, Joa <joachim.pfeif...@gmail.com> wrote:
> Wow, I poked around London Datastore a few weeks ago in hopes of
> finding this. Now that it's out, I didn't lose much time this morning
> to try this out. The following steps create a GTFS feed from the
> TransXChange file set:
>
> 1. Download the TransXChange file set; it is a zip archive. No need to
> unpack it.http://data.london.gov.uk/datastore/package/tfl-timetable-listings
> T&C's apply.
>
> 2. Grab the latest stop set from NaPTAN.
> 2.1 It is hosted here:http://www.dft.gov.uk/public-transportdatamanagement/DataUser_Login.aspx
> Registration required. T&C's apply.
> 2.2 Navigate to download:
> NaPTAN Reference Data
> Preferences
> - Data Required:        All Areas
> - Data Format:  CSV ZIP
> - Data Version: Version 1
> 2.3 Extract the Stops.csv file from the zip archive.
>
> 3. Convert
> 3.1 Use the latest version of the TransXChange2GoogleTransit (1.6.5).
> Download here:http://code.google.com/p/googletransitdatafeed/downloads/list
> Released under Apache 2.0 License
> 3.2 In steps 1 and 2.3, I've downloaded the relevant data into
> directory TfL. The following command references this directory:
> $ ./tXCh2GT.bat TfL/tfl-timetable-listing.ziphttp://www.london.gov.uk

Noam Ben Haim

unread,
Sep 7, 2010, 3:58:44 PM9/7/10
to transit-d...@googlegroups.com
Without looking at the code, I believe the OpenRequired indicate a missing stopcode in Stops.csv. I know that when I ran the process without passing it a Stops.csv file all of the lat/lng in stops.txt were "OpenRequired".
And indeed:
grep 490000254009 Stops.csv
yields no result...

It seems to me TfL are referencing non existing stops (at least compared to the NaPTAN dataset I am using (from the UK data.gov.uk datastore)

N

--
You received this message because you are subscribed to the Google Groups "Transit Developers" group.
To post to this group, send email to transit-d...@googlegroups.com.
To unsubscribe from this group, send email to transit-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/transit-developers?hl=en.


Roger Slevin

unread,
Sep 7, 2010, 4:09:29 PM9/7/10
to transit-d...@googlegroups.com

There are two types of “missing” stop references in the data

 

The first type is inevitable because the schedule data is weekly, whereas the NaPTAN data from data.gov.uk is only updated four times a year.  So there will be unreferenced stops in NaPTAN until such time as TfL also publish their NaPTAN data more frequently and directly through the Datastore.

 

The other type – stops with codes beginning 999 – relate to stops outside Greater London for which TfL has not obtained the correct NaPTAN stop reference from the authority in whose area the stop sits.   TfL is being encouraged to get the correct NaPTAN references for the relatively small number of 999-prefixed stops as soon as possible.

 

Roger

Joa

unread,
Sep 7, 2010, 10:14:35 PM9/7/10
to Transit Developers


On Sep 7, 12:58 pm, Noam Ben Haim <noa...@gmail.com> wrote:
> Without looking at the code, I believe the OpenRequired indicate a missing
> stopcode in Stops.csv.

Correct, these cases are a result of unresolved stop point references.
The converter generates "OpenRequired" in place of coordinates. The
GTFS spec does specify a placeholder for undefined coordinates.

Joa

unread,
Sep 7, 2010, 10:17:52 PM9/7/10
to Transit Developers


On Sep 7, 12:13 am, HealsJnr <heals...@gmail.com> wrote:
> 1129, SId_76167m3,null,1
> 1130, SId_76167m3,null,1

It appears the operator/service 76/167 revealed a bug in the
converter; I will look into it.

Joa

unread,
Sep 7, 2010, 10:30:35 PM9/7/10
to Transit Developers


On Sep 7, 12:13 am, HealsJnr <heals...@gmail.com> wrote:
> 've run the converter and generated
> the output as expected (although i did have to set the -xmx flag to
> about 6GB).

I am testing on a vanilla 32bit XP box with stock Sun JVM. 1024m were
sufficient to process the TfL release.
Sound like you running this in a 64bit environment?

Joa

unread,
Sep 7, 2010, 10:32:23 PM9/7/10
to Transit Developers


On Sep 7, 7:14 pm, Joa <joachim.pfeif...@gmail.com> wrote:
> On Sep 7, 12:58 pm, Noam Ben Haim <noa...@gmail.com> wrote:
>
> > Without looking at the code, I believe the OpenRequired indicate a missing
> > stopcode in Stops.csv.
>
> Correct, these cases are a result of unresolved stop point references.
> The converter generates "OpenRequired" in place of coordinates. The
> GTFS spec does not specify a placeholder for undefined coordinates.

Joe Hughes

unread,
Sep 9, 2010, 1:30:21 PM9/9/10
to Transit Developers
Yes, GTFS was designed for geographical applications, and in many
cases stops without coordinates would be useless for consuming apps.
Thus there's an implicit assumption that such incomplete entries would
be omitted from GTFS feeds.

Joe

HealsJnr

unread,
Sep 12, 2010, 6:59:37 PM9/12/10
to Transit Developers
One other question I had about the data and about the TxC2GTFS
converter: Are there any plans to derive the transport mode of a
service from the TXC data directly rather than specifying a default
transport mode?

In the case of London, the most practical way to convert the data is
using the zip file, meaning all services have the same transport mode
(in this case Bus (3)). Whilst it would be possible to extract the
zip, process each file individually specifying a transport mode, and
then recombine the output, this would be time consuming and error
prone given there are 800 or so individual files.

Having looked at the TXC spec, there is a VehicleModeType identifier,
however, as far as i can see this is a free field, there are no
defined transport modes as with GTFS. Hence to convert to GTFS there
would need to be a mapping file as part of the input which specified
how a VehicleModeType mapped to a GTFS mode.

Has anyone tried to do this before? Does it sound feasible? Apologies
if this has been raised before, or if there is a more appropriate
forum to raise these suggestions.

Cheers,

Roger Slevin

unread,
Sep 13, 2010, 5:38:37 AM9/13/10
to transit-d...@googlegroups.com

Table 6-10 in the TransXChange guidance shows the following as permitted values for mode

 

Value

Description

air

Air service.

bus

Bus service.

coach

Coach service.

underground

Metro service.

ferry

Ferry service.

train

Train service.

tram

Tram service.

underground

Underground service.

 

 

Roger

--

Reply all
Reply to author
Forward
0 new messages