Identifying item in incidents (disruptions) feed from Darwin Station Messages.

Ian Bale

unread,

Aug 16, 2017, 5:33:32 PM8/16/17

to A gathering place for the Open Rail Data community

I'm looking at the Station Messages feed from Darwin.

{"ts":"2017-08-16T21:33:53.285073+01:00","stationMessage":{"_payload":{"cat":"Train","id":66965,"sev":1,"Station":[{"crs":"ACT"},{"crs":"AHT"},{"crs":"AHV"},{"crs":"ASH"},{"crs":"BAG"},{"crs":"BCE"},{"crs":"BFN"},{"crs":"BKO"},{"crs":"BTY"},{"crs":"CAM"},{"crs":"CLJ"},{"crs":"EAR"},{"crs":"EGH"},{"crs":"FEL"},{"crs":"FML"},{"crs":"FNH"},{"crs":"GLD"},{"crs":"LNG"},{"crs":"MAO"},{"crs":"NEM"},{"crs":"RDG"},{"crs":"SNG"},{"crs":"SNS"},{"crs":"SUR"},{"crs":"TWI"},{"crs":"VIR"},{"crs":"WAN"},{"crs":"WAT"},{"crs":"WBY"},{"crs":"WIM"},{"crs":"WKM"},{"crs":"WNS"},{"crs":"WOK"},{"crs":"WTI"},{"crs":"WYB"}],"Msg":{"a":{"href":"http://nationalrail.co.uk/service_disruptions/169587.aspx","$t":"Latest Travel News"}}},"_stations":[{"crs":"ACT"},{"crs":"AHT"},{"crs":"AHV"},{"crs":"ASH"},{"crs":"BAG"},{"crs":"BCE"},{"crs":"BFN"},{"crs":"BKO"},{"crs":"BTY"},{"crs":"CAM"},{"crs":"CLJ"},{"crs":"EAR"},{"crs":"EGH"},{"crs":"FEL"},{"crs":"FML"},{"crs":"FNH"},{"crs":"GLD"},{"crs":"LNG"},{"crs":"MAO"},{"crs":"NEM"},{"crs":"RDG"},{"crs":"SNG"},{"crs":"SNS"},{"crs":"SUR"},{"crs":"TWI"},{"crs":"VIR"},{"crs":"WAN"},{"crs":"WAT"},{"crs":"WBY"},{"crs":"WIM"},{"crs":"WKM"},{"crs":"WNS"},{"crs":"WOK"},{"crs":"WTI"},{"crs":"WYB"}]},"origin":"Workstation"}

We have a nice list of stations affected by the incident. But no link to the relevant item in the disruptions feed (https://datafeeds.nationalrail.co.uk/api/staticfeeds/5.0/incidents), just a link to the NRE website which is displaying the text for that incident.

Is there any way to link the two?

If not, I'm thinking that I scrape the page off the NRE site, extract the title and find that in the incident feed. Seems fairly straight forward. Anyone have any better ideas?

Peter Hicks (Poggs)

unread,

Aug 16, 2017, 6:17:54 PM8/16/17

to Ian Bale, A gathering place for the Open Rail Data community

Hi Ian

You’ve posted some JSON, but the Darwin feed is actually in XML. Here’s the raw message:

<?xml version="1.0" encoding="UTF-8"?>

<ns4:Station crs="ACT"/>

<ns4:Station crs="AHT"/>

<ns4:Station crs="AHV"/>

<ns4:Msg>Journeys between Virginia Water and Weybridge are being delayed by up to 40 minutes. More details can be found in <ns4:a href="http://nationalrail.co.uk/service_disruptions/169587.aspx">Latest Travel News</ns4:a></ns4:Msg>

</OW>

</uR>

</Pport>

Are you certain you’re translating the message in to JSON correctly?

Peter

Ian Bale

unread,

Aug 16, 2017, 6:40:24 PM8/16/17

to A gathering place for the Open Rail Data community

Of course. Silly of me!

I'm using this library : https://www.npmjs.com/package/openraildata-darwin to read the data into a nodejs app and I've copy/pasted its output, not the original XML. However, the question still stands. There is no reference the to incident ID in the XML...

Peter Hicks (Poggs)

unread,

Aug 17, 2017, 5:02:37 AM8/17/17

to Ian Bale, A gathering place for the Open Rail Data community

Hi Ian

On 16 Aug 2017, at 23:40, Ian Bale <i...@hazardousfrog.com> wrote:

Of course. Silly of me!

I'm using this library : https://www.npmjs.com/package/openraildata-darwin to read the data into a nodejs app and I've copy/pasted its output, not the original XML. However, the question still stands. There is no reference the to incident ID in the XML…

I’ve had a quick look at the incident at Purley this morning. London Bridge has a station message with a link to http://ojp.nationalrail.co.uk/service/ldbboard/dep/LBG, which also appears in the InfoLink tag of incident number ACCBB955316C4EC985BB46CB759C6BCF in the Incidents XML feed.

Incident AA3439B6E5C840858D28E7398970328F at Stevenage appears in the XML feed but not for the station (presumably because it’s cleared).

Try having a look at this correlation for a sample size of more than the three I randomly picked - it’d be useful if you could report back to the group what you find.

Peter

Ian Bale

unread,

Aug 17, 2017, 10:29:03 AM8/17/17

to A gathering place for the Open Rail Data community

I think I have a plan of action now...

Firstly, I've identified that Station Messages contain two types of messages "station" and "train". The "train" ones are the ones I was most interested in as they provided additional information that I can include in the Incidents data and the service-status data. They contain descriptive text, but no lost of affected stations, whereas the station message contains the affected stations, but no detail. As Peter points out, the one common element is the URL to a status page on NRE site.

So I will:

1. Grab the latest on-demand Incidents data from NRE

2. Subscribe to Darwin live incidents feed

3. Subscribe to Darwin live station message feed

4. Grab on-demand service-status feed at regular intervals throughout the day. Either every X minutes, or possible each time I get a live incident - as service-status for a TOC may change as a result of that incident.

Then, as I receive each live incident message (step 2) I will update my local copy of the on-demand data I grabbed in step (1). That will mean I maintain a local copy that should match what I'd get if I repeatedly downloaded the on-demand data.

Unfortunately, there is no live service-status, so I'm stuck with grabbing the on-demand data from time to time.

Now when I receive a "train" type Station Message, I can locate the matching item in my incidents data using the NRE URL which is common to both feeds. So I will end up with the incidents feed, but with the addition of the affected stations CRS codes (which I will probably change to NLC since I used them everywhere else).

Separately, I will build a new data file which I will populate from "station" type Station Messages. These do not have any NRE link and do not correspond to any incident records. I need to monitor this date for a little longer to fully understand what is being delivered, but I've spotted a few things already.

Messages have IDs. But they are not unique. Many messages are sent using the same ID. These seem to update the previous message content. For example [52513] refers to Ticket Vending Machines that are out of order. Each time it is received there is an updated list of stations that have TVMs out of order. As they get fixed, there will be a new message and the station where the machines were fixed will no longer be present. At some point (hopefully!) there will be a message with no attached station codes, which I am assuming means that there are no stations with TVMs out of order.

[53113] refers to stations where the Ticket Office is closed - presumably when it ought to be open.

I've seen a few referring to lift out of order, but so far, only singles, not in groups. But I presume these will follow the same pattern. A message with a station code (or multiple codes) means that text applied to those stations. A message with no stations means it's cleared for any station we'd previously received it for.

I need to give a little thought as to how best to maintain this data so that I have an accurate snap-shot at any moment in time that is easily indexed by station code, but also easily indexed by message number so that I can apply updates.

I've been building a Data Manager which is going to be used by a ticket retailer. The purpose of it is to pull in as many [useful] feeds and where possible, combine the data, taking the most reliable feeds for each type of data (eg. RCS data for which stations have TVMs machines is woefully inaccurate. So I will take that for the NRE feed which seems to be far better). The output of which will be a "definitive" set of data for stations, fares, fare location, restriction code, route codes etc. I'll need to make enquires with RDG / NRE etc. about re-posting their data. Provided they are happy for me to do this, I will investigate making a public interface so that other users can grab my feeds rather than trying to replicate all this. I may even decide to post the code (node.js application), but to use it you'd need access to all the feeds, not just the currently publicly available ones.

The only thing that I can see as a real issue (other than permission to re-post) is that we are using RCS data which tells us which fares we are permitted to sell, and filtering the other feeds. So the output of my data manager is just data for fares we can sell. Fares we cannot sell are excluded from the feed. If there is enough interest and we deal with the permission issue, then I could just duplicate the code and remove the filters so it spits out a complete set rather than the filtered set we need...

I have a massive work schedule though, so making this public is not something I'm going to address this side of Christmas. Anyway, I hope my analysis of the data above is of some use to others...

Reply all

Reply to author

Forward