Sources for historical data on causes for cancellations and delays

190 views
Skip to first unread message

ro...@inasight.com

unread,
Dec 21, 2016, 8:47:06 AM12/21/16
to A gathering place for the Open Rail Data community
Hi

As an exciting home Xmas project (!) I was looking for data on the causes of delays/cancellations, focused on Network Rail side rather than the TOC side. I guess ideally I'd get this from Network Rail (e.g. problem with signal X) but there doesn't seem to be an open data feed holding that data.

From NRE, there is data in the Darwin push port about reasons for delays and cancellations, so I can get at these in the FTP downloads, but this is far from ideal as the data is very non-specific, e.g. this train was delayed, but with no detail as to where the delay was "caused" in its route. Also, it's not clear where this reason code is captured from (anyone know?), and I suspect the data quality might be poor at times.

Also, assuming I use this data from the FTP server, I'd need to download it all once per day for some period of time to get a sufficient volume of data, as there only seems to be today's data available. Not a huge problem, but I'd rather download it in bulk.

Any ideas of better strategies?

Thanks
R

Peter Hicks

unread,
Dec 21, 2016, 9:00:02 AM12/21/16
to ro...@inasight.com, A gathering place for the Open Rail Data community
Hi Robin

Have a look at "Historic Delay Attribution" under http://www.networkrail.co.uk/transparency/datasets/.  It's the most specific you're likely to be able to get, and does down to delay reasons and possibly even assets and individual trains.  This isn't a real-time feed, but a snapshot of historical data after attribution has taken place.

The data in the Darwin Push Port isn't specific because it's for customer information purposes, and is designed to be short and informative rather than lengthy and overly specific.


Peter




--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send an email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ro...@inasight.com

unread,
Dec 21, 2016, 9:43:27 AM12/21/16
to A gathering place for the Open Rail Data community, ro...@inasight.com
Thanks Peter - perfect. I'd had a good look round but still didn't find this until you pointed me at it. I've had a quick look at the data and it looks pretty good for my use case.
Thanks again
R

Mike Andrews

unread,
Jan 9, 2017, 8:10:14 AM1/9/17
to A gathering place for the Open Rail Data community, ro...@inasight.com
Just thought I'd piggy-back on this thread as my question is related..

The Historical data is I think only up to Aug 2016 or so.  So I wanted to look at changes and cancellations to the established/expected timetable with respect to the GTR dispute.  I was interested to see the reduction in services offered to "improve reliability" in July 2016, the subsequent reduction due to the no-overtime operating environment which came unannounced in Dec 2016 (and has been extended indefinitely) and the on-the-day experience of any further cancellations/delays.  I lost my DARWIN account due to inactivity (crashed server, my fault...) so I was looking at the non-realtime data provided via FTP (datafeeds.nationalrail.co.uk).  I pulled in the Schedule XML to an ElasticSearch index for a few sample days and am only seeing 1 or 2 services with CancellationReasons denoting lack of staff/resources/industrial action (codes 150,167,183,185,186,523,524,525,526,527,528,886,887).

Are there any data sources (historic or realtime) that include this kind of cancellation information?  One thing that prompted this was seeing inaccurate reports when waiting for a train such as cancellations due to "industrial action" when there was no "strike" on the given day.

Cheers,
Mike

Peter Hicks

unread,
Jan 9, 2017, 8:40:43 AM1/9/17
to Mike Andrews, A gathering place for the Open Rail Data community, ro...@inasight.com
Hi Mike

The Schedule XML file is only relevant at the start of the day - it's updated by the Push Port messages and/or snapshots.  What you're analysing is a point-in-time view.

Bear mind too that the cancellation reasons in Darwin are for customer information purposes, and if you're wanting to perform a thorough analysis, you will probably want to look at the cancellations on TRUST, as they're the ones which are more robustly analysed than those in Darwin.


Peter

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.

Mike Andrews

unread,
Jan 10, 2017, 5:36:23 AM1/10/17
to A gathering place for the Open Rail Data community, jabb...@gmail.com, ro...@inasight.com
Thanks Peter.  I'll sit in the pending queue patiently waiting for my account to get reactivated as I guess I need an active account to get the S3 CIF links (am getting 401 when calling this with my Authorization header set with my account details, and subscribed to that TOC feed).

A quick look on Raildar doesn't report the services that have been "temporarily" cancelled until further notice, hopefully these are still listed in the TRUST dataset.  eg there used to be 2x the number of GX services until the overtime ban, such as 18:15 VIC -> 19:21 BTN. 

Cheers,
Mike
Reply all
Reply to author
Forward
0 new messages