Accessing Darwin late running codes

152 views
Skip to first unread message

Ellie McCarthy

unread,
Jul 14, 2020, 10:19:39 AM7/14/20
to A gathering place for the Open Rail Data community
Hi everyone
I'm new to using the Open Rail Data feeds and STOMP, so far this community has been super helpful :)
I'm still getting to grips with using STOMP and accessing different topics, and I was hoping someone would be able to point me in the right direction with what I'm trying to find. I'm hoping to collect lots of data on past services, specifically those services with late running codes. My understanding is that this data is available through the Darwin Push Port Reference Data, however I'm struggling to find the topic link e.g. /topic/<topic name> needed to access it. 
Any pointers would be hugely appreciated!!

Peter Hicks

unread,
Jul 14, 2020, 10:23:46 AM7/14/20
to A gathering place for the Open Rail Data community
Hi Ellie

The Push Port data feed is live and forward-looking, with no ability to query it historically.  You can either capture the messages now and  and save them locally, or use one of the many archives of data depending on your use case.  If you want to start an archive without necessarily having to use Stomp, look at the 'Darwin FTP Information' part of the My Feeds page on https://opendata.nationalrail.co.uk/ - there will be some historical data there, but only for the past hours from now.

The reference data is only a set of mappings from things like reason code to textual description, train operator code to textual description etc.

Best wishes,

Peter Hicks
Director
OpenTrainTimes Ltd.


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openraildata-talk/8b2607e4-b379-466c-9b1d-83d3b75fa16ao%40googlegroups.com.


OpenTrainTimes Ltd. registered in England and Wales, company no. 09504022.
Registered office: 13a Davenant Road, Upper Holloway, London N19 3NW

Ellie McCarthy

unread,
Jul 15, 2020, 4:31:07 AM7/15/20
to A gathering place for the Open Rail Data community
Hi Peter

Thanks for such a quick reply. I have now been able to collect data via the FTP, now I am in the process of interpreting it! I have been going through some all threads, and I noticed in https://groups.google.com/forum/#!topic/openraildata-talk/WMaZVFI-z54 there was a suggestion of providing more information on snapshot functionality- has this been made available yet? So far I am unable to find any additional guidance on processing / interpreting snapshot data - for example, are there late running codes within each snapshot & explanations of each <attribute>.

Many thanks in advance

Peter Hicks

unread,
Jul 15, 2020, 5:02:26 AM7/15/20
to A gathering place for the Open Rail Data community
Hi Ellie

If I read the linked thread correctly, I was going to update the wiki to clarify snapshot functionality as somebody found it unclear on requesting a snapshot.

From memory, snapshots now arrive every three hours with a set of archived messages since the last snapshot, so the wiki probably needs changing again if I (or anyone else) didn't get around to it the first time.

The simplest way to check if there should be late running codes within a snapshot would be to look at the XML schema definition, rttiCTTSchema_v8.xsd, which contains this::

<xs:element name="cancelReason" type="ct:DisruptionReasonType" minOccurs="0">
    <xs:annotation>
        <xs:documentation>Reason for cancellation of service/location</xs:documentation>
    </xs:annotation>
</xs:element>

From that, I conclude that cancellation reasons are included in the snapshot data, but late running reasons aren't.  However, I suspect there's other data in the snapshot file itself, covered under a different XML schema, which has late reasons too.

Peter Hicks
Director
OpenTrainTimes Ltd.

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

RailAleFan

unread,
Jul 15, 2020, 5:40:02 AM7/15/20
to A gathering place for the Open Rail Data community
Hi Ellie,

A cancelReason applies to an entire schedule and appears as;

Pport/uR/schedule/cancelReason (stream / FTP logs)
Pport/sR/schedule/cancelReason (snapshot)

A LateReason applies to one or more locations and is contained in the TS (Train Status) message as

Pport/uR/TS/LateReason (stream / FTP logs)
Pport/sR/TS/LateReason (snapshot)

I've attached an example of each from the current snapshot so yes - it's all in there as the intention of the snapshot is to enable clients to "sync up" with the Darwin database when coming online so bear in mind the limitation of what is contained based on that purpose.  Due to the random nature of when it appears if you're looking to build up a database of trains that went late or were cancelled that is probably best done on the stream - but processing the snapshot would be a great way to get started.

Hope this helps!

schedule_cancelReason.xml
TS_LateReason.xml

Ellie McCarthy

unread,
Jul 16, 2020, 6:39:18 AM7/16/20
to A gathering place for the Open Rail Data community
Thank you so much, both of you, it makes much more sense now. Ideally I would use stomp and have a client download the data for me, but working from home has given me some SSL issues - so building my database using these snapshots seems to be the best option. There is years worth of snapshot data on https://cdn.area51.onl/archive/rail/darwin/index.html which is proving extremely helpful. 

Am I correct in assuming that there is no overlap between consecutive snapshots?

Best,
Ellie

RailAleFan

unread,
Jul 16, 2020, 9:47:32 AM7/16/20
to A gathering place for the Open Rail Data community
Hi Ellie,

There will be overlap in terms of schedules in the current snapshot also being in the next and so on since most schedules are added to the database overnight and from then on will be in every snapshot generated until the train has run and been deactivated shortly after terminating...

Cheers
Reply all
Reply to author
Forward
0 new messages