I think I have a plan of action now...
Firstly, I've identified that Station Messages contain two types of messages "station" and "train". The "train" ones are the ones I was most interested in as they provided additional information that I can include in the Incidents data and the service-status data. They contain descriptive text, but no lost of affected stations, whereas the station message contains the affected stations, but no detail. As Peter points out, the one common element is the URL to a status page on NRE site.
So I will:
1. Grab the latest on-demand Incidents data from NRE
2. Subscribe to Darwin live incidents feed
3. Subscribe to Darwin live station message feed
4. Grab on-demand service-status feed at regular intervals throughout the day. Either every X minutes, or possible each time I get a live incident - as service-status for a TOC may change as a result of that incident.
Then, as I receive each live incident message (step 2) I will update my local copy of the on-demand data I grabbed in step (1). That will mean I maintain a local copy that should match what I'd get if I repeatedly downloaded the on-demand data.
Unfortunately, there is no live service-status, so I'm stuck with grabbing the on-demand data from time to time.
Now when I receive a "train" type Station Message, I can locate the matching item in my incidents data using the NRE URL which is common to both feeds. So I will end up with the incidents feed, but with the addition of the affected stations CRS codes (which I will probably change to NLC since I used them everywhere else).
Separately, I will build a new data file which I will populate from "station" type Station Messages. These do not have any NRE link and do not correspond to any incident records. I need to monitor this date for a little longer to fully understand what is being delivered, but I've spotted a few things already.
Messages have IDs. But they are not unique. Many messages are sent using the same ID. These seem to update the previous message content. For example [52513] refers to Ticket Vending Machines that are out of order. Each time it is received there is an updated list of stations that have TVMs out of order. As they get fixed, there will be a new message and the station where the machines were fixed will no longer be present. At some point (hopefully!) there will be a message with no attached station codes, which I am assuming means that there are no stations with TVMs out of order.
[53113] refers to stations where the Ticket Office is closed - presumably when it ought to be open.
I've seen a few referring to lift out of order, but so far, only singles, not in groups. But I presume these will follow the same pattern. A message with a station code (or multiple codes) means that text applied to those stations. A message with no stations means it's cleared for any station we'd previously received it for.
I need to give a little thought as to how best to maintain this data so that I have an accurate snap-shot at any moment in time that is easily indexed by station code, but also easily indexed by message number so that I can apply updates.
I've been building a Data Manager which is going to be used by a ticket retailer. The purpose of it is to pull in as many [useful] feeds and where possible, combine the data, taking the most reliable feeds for each type of data (eg. RCS data for which stations have TVMs machines is woefully inaccurate. So I will take that for the NRE feed which seems to be far better). The output of which will be a "definitive" set of data for stations, fares, fare location, restriction code, route codes etc. I'll need to make enquires with RDG / NRE etc. about re-posting their data. Provided they are happy for me to do this, I will investigate making a public interface so that other users can grab my feeds rather than trying to replicate all this. I may even decide to post the code (node.js application), but to use it you'd need access to all the feeds, not just the currently publicly available ones.
The only thing that I can see as a real issue (other than permission to re-post) is that we are using RCS data which tells us which fares we are permitted to sell, and filtering the other feeds. So the output of my data manager is just data for fares we can sell. Fares we cannot sell are excluded from the feed. If there is enough interest and we deal with the permission issue, then I could just duplicate the code and remove the filters so it spits out a complete set rather than the filtered set we need...
I have a massive work schedule though, so making this public is not something I'm going to address this side of Christmas. Anyway, I hope my analysis of the data above is of some use to others...