Experimental backup for NRDP provision of Darwin push port data

212 views
Skip to first unread message

Rail Delivery Group

unread,
Sep 18, 2019, 6:52:38 AM9/18/19
to A gathering place for the Open Rail Data community
Hi All,

We've been conscious of the issues experienced by NRDP users of the Darwin push port data when NRDP loses its connection to Darwin, as evidenced recently by issues with snapshotting within NRDP. Whilst RDG continue to work with CACI to ensure NRDP is as robust as we can make it, inevitably there will be times when, for whatever reason, NRDP is unable to talk to Darwin even though Darwin itself is operating normally.

As mentioned on various other posts, RDG log Darwin data for a number of internal reasons, mainly relating to MI reporting or incident investigations. It has become apparent to us that as our logging tool is connected to Darwin directly it is not impacted by any NRDP issues, which means we are still logging push port data whilst NRDP users are missing out until the NRDP connection is restored. We have therefore decided to experimentally allow access to these darwin direct logs to see whether NRDP users find these to be a useful backup when NRDP itself is not operating normally. 

Our logging tool has until now created hourly logs of push port data as that was sufficient for our use. However, that isn't useful as a backup for real time usage so we have made a change so that the logger will also provide 30s logs of Darwin pushport data - so this isn't as realtime as connecting to an NRDP topic but as a backup, for most realtime purposes 30s logs are hopefully still useful (and better than nothing!). The logs are written to an AWS S3 bucket which for we've created an SFTP access point. We've reused a bucket where we are already providing logs of v16 push data from NRDP so we've created a sub-solder called darwin_direct to contain the logs directly from darwin. As 30s log files mean there will be a large volume of files very quickly we've set this folder up to delete the files after 48 hours. However, as suggested by Peter Hicks, we've included the hourly logs in an archive folder for anyone who wants to look back further in time.

Key points to note:
  • This is provided as an experiment and is not guaranteed to be provided indefinitely. NRDP remains the service open data users must use as their main source of Darwin push port data.
  • The logs RDG make available here are from a logging tool we have set up for internal RDG use, so does not have any out of hours support (and depends on staff availability for any normal office hours support!)
  • If the underlying cause of an issue is Darwin then our logging tool will be just as impacted as NRDP, it is only when Darwin is working normally but NRDP is experiencing issues connecting with it that these logs potentially become useful to NRDP users.
  • Whilst the schema is the same as NRDP (v16), as this is connected to Darwin directly there are some differences in service users need to account for, such as:
    • <uR> and <sR> messages may contain multiple children as permitted by the Darwin schema
    • There is no ability to filter
    • There is no header information, the log files contain the push port XML data only
    • The RDG logging tool handles the correct sequencing of records, including snapshotting when required
If you are interested in checking out these logs, please contact onl...@raildeliverygroup.com and we will provide the SFTP access credentials. A couple of sample files are attached.

Regards,
RDG

20190918080000_PP.log.gz
20190918080030_PP.log.gz

Chris Bailiss

unread,
Sep 19, 2019, 5:28:09 AM9/19/19
to A gathering place for the Open Rail Data community
Hello RDG

Thanks for taking the initiative here, however being honest I am somewhat confused/conflicted about whether/how to use this.

Modifying existing solutions, testing, etc is not an insignificant effort, so given the major caveat of "provided as an experiment and is not guaranteed to be provided indefinitely" I am not sure whether this is worth building around.  Some kind of time commitment about how long this will be available would help at least a bit with this.  E.g. 1 month, 3 months, 1 year are all quite different.

Also, to state the obvious, a single reliable source of data is much easier to consume than trying to combine a potentially unreliable source with another experimental/short-term source.

>> inevitably there will be times when, for whatever reason, NRDP is unable to talk to Darwin even though Darwin itself is operating normally.
This seems to be an admission that the current architecture for the NRDP has issues that are difficult to resolve.  Going back to the discussion in the the earlier long thread, a mechanism that logs all Darwin data that is provided as a long-term supported service (for those of us that don't need absolutely real-time) would both reduce load on the real-time NRDP and also likely be more reliable (due to it's lower complexity).  In essence, a long-term supported version of the experimental service you have offered here.  If that were provided long term, I and others would definitely be interested in moving over to it from the real-time feed.

Thanks

Chris
Reply all
Reply to author
Forward
0 new messages