Fwd: NRE launches new arrangements for developers to access live train information

1,072 views
Skip to first unread message

Peter Mount

unread,
May 23, 2014, 4:45:05 AM5/23/14
to openraildata-talk
FYI

---------- Forwarded message ----------
From: <c...@nationalrail.co.uk>
Date: Wed, May 21, 2014 at 1:22 PM
Subject: NRE launches new arrangements for developers to access live train information
To: pe...@retep.org.uk


 

If you're having trouble reading this email, please visit the website.


National Rail Enquiries

Home Train times & tickets Stations & on train Chnages to train times Help
New arrangements for accessing live train information for developers


I'm delighted to announce that as part of the rail industry's commitment to transparency, that we will from June 2014 be making it much easier for individuals and organisations to access and use train running information when developing apps and other online tools.


Until now, a number of organisations and developers that use the service have been charged.


Public sector organisations and small commercial or private users will now be able to access the Darwin system for free.


Only the biggest commercial or private users, whose services are used more than five million times by their customers in a four week period, will be considered high volume users and will be charged.


The free access granted to public bodies, including Transport for London, passenger transport executives and local authorities, will be regardless of how many requests for information their customers make.


For all users, a licence will no longer be required. Instead, there will be terms and conditions that the user accepts (there will be no approval process): this will make it quicker and easier to set up new services. This will come into effect from 1 June 2014, although the sign-up process will still be manual until the required technical changes have been made.


We're making this change to encourage innovation in the field of providing train time information to customers. If you're interested in using our information services, please get in touch via our Developers Forum on Linked In, or email infose...@nationalrail.co.uk.


Other news


Updates to Android and iPhone apps


In other news, this month, we have released new versions of our iPhone and Android apps in order to fix some issues which were found on the smartphone apps.


Android
  • Fixed an issue with permissions
iPhone
  • Fixed an issue with the station selector affecting some iOS7 users
  • Fixed an issue with some station names appearing incorrectly
I'd like to apologise to any customers that were affected by any of these problems.


Real time information issues


Recently we've experienced problems with real time information not being available. These issues affected our Online Journey Planner, Live Departure Boards and Train Tracker services.


We've had several incidents where we have lost connectivity to some of our data suppliers. We are working with our partners to achieve a long term solution to the problem to ensure the ongoing stability of our connections.


To unsubscribe from this newsletter please visit this link:
http://ojp.nationalrail.co.uk/en/p/unsubscribe/KTciXWL6Sud6ceXj2L87Gw%3D%3D/vKWffydHZglk5MvCbSW7Kw%3D%3D/fGjAlE9kJf1PiSqTrtdiGg%3D%3D





--
Peter Mount

Phone: +44 (0)7762 028 750
Email: pe...@retep.org.uk
Email: pe...@retep.org
  Web: http://retep.org
  Web: http://trainwatch.co.uk
 XMPP: pe...@retep.org

Chris Bailiss

unread,
May 23, 2014, 6:40:27 AM5/23/14
to openrail...@googlegroups.com, pe...@retep.org.uk
Re: the earlier question about the push-port...

From http://www.nationalrail.co.uk/46391.aspx:

When do the new arrangements start?

We will introduce the new arrangements from 1 June 2014. However, we will need to continue signing people up on the current manual process initially as there a number of changes we need to make to our services in order to implement the automated sign-up process. **We anticipate completing this by October for enquiry service users, and by the end of Q1 2015 for ‘push port’ users.**

Mike Flynn

unread,
Jan 29, 2015, 5:48:41 AM1/29/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Has anyone recently applied for the newly available feeds and services?  The Developer Pack says of Service Licence Requests to send to infors...@nationalrail.co.uk whilst the Developer Guide says documentation is available at nrelic...@atoc.org.  I sent a note to the latter asking to apply for all feeds but this was a few days ago and I haven't heard anything back. I already have a licence to use the Live Departure Board and so have experience of NRE not being the easiest to communicate with.  But it's been some time since last I did.  I'd be grateful on any advice and any recent developments.

Peter Hicks

unread,
Jan 29, 2015, 5:58:08 AM1/29/15
to Mike Flynn, openrail...@googlegroups.com, pe...@retep.org.uk
Hi Mike

On 29 Jan 2015, at 10:48, Mike Flynn <mi...@a1publishing.com> wrote:

> Has anyone recently applied for the newly available feeds and services? The Developer Pack says of Service Licence Requests to send to infors...@nationalrail.co.uk whilst the Developer Guide says documentation is available at nrelic...@atoc.org. I sent a note to the latter asking to apply for all feeds but this was a few days ago and I haven't heard anything back. I already have a licence to use the Live Departure Board and so have experience of NRE not being the easiest to communicate with. But it's been some time since last I did. I'd be grateful on any advice and any recent developments.

How many days ago? Demand for these feeds is high, so it may take a few days - if I recall, it took a week or so to get a reply to my email. There’s also a bunch of other work going on to get the Darwin Push Port opened up, so it may take a little longer than normal since there are only a finite number of people :-)

I think these semi-static XML feeds are going under a self-service registration system in the not too distant future and, if so, you’ll be able to sign up to access them yourself, just as you can do so for the OpenLDBWS API.

Kind regards,


Peter

signature.asc

Mike Flynn

unread,
Jan 29, 2015, 6:13:21 AM1/29/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hi Peter,

Thanks for that quick reply.  

It's always reassuring to know I'm not being ignored.  It would be better if NRE at least sent an acknowldgement in the meantime but, as you point out, and I might have guessed, there will be a high demand, and it is open data after all.

As it happens, I was granted a licence for these static feeds a while back but never was able to gain access.  Long story but I'll be diplomatic too and say I probably never properly understood properly the provided instructions.  And I was busy also with other stuff.

As I say though, it's good to know that there is a process going on and that it may well take some days before I hear back.  

I must also say though that it's a superb development.  The possibilities for developers have greatly improved over the past few years.

Cheers
Mike

Mike Flynn

unread,
Jan 30, 2015, 7:17:31 AM1/30/15
to openrail...@googlegroups.com, pe...@retep.org.uk
 Just to update: I've now heard back from NRE and the application process is under way :)

 

warrell harries

unread,
Feb 2, 2015, 6:11:50 AM2/2/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hi All,

I'm in a similar position. I applied back in the summer of 2014 but never heard anything. If someone monitoring the forum can get me some information about how to 'consume' Train Movement data I would be very grateful.

Best regards

Warrell Harries
Voestalpine Signaling UK

Prof Falken

unread,
Feb 2, 2015, 7:25:21 AM2/2/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hi all,

Out of interest, is there anything in DARWIN that I wouldn't see in the NROD datafeeds?

Thanks,

Matt

Chris Bailiss

unread,
Feb 3, 2015, 5:49:52 AM2/3/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Darwin has a forecast element to it, that the NROD feeds don't since they focus on reporting actuals.  Some OD sites like RealTimeTrains make their own forecasts based on current NROD  train running information, past NROD train running information and their own forecasting mechanisms.

I am by no means an expert on Darwin, but my understanding is that quite a lot of effort has been put into the forecasting processes.  I guess it might also have more detailed delay explanation information within it, which the NROD feeds do not (other than for cancellations).

Peter has added some information here:

I haven't yet read through it in detail, but from a quick scan just now, one thing that did strike me is one of the T&C's being if you take any data from Darwin and display any forecast information in your site/app, then you must use the Darwin forecast information alone.  That sort of makes sense given how much effort has gone into the forecasting, but I imagine it would prevent for example Tom (from RealTimeTrains) from combining any information from Darwin with his already quite sophisticated forecasting.  So while I understand the T&C, it means you can't use multiple feeds to compliment each other - you must use theirs or yours alone.  Of course, if the usage of your app/website (e.g. again RealTimeTrains), is very high, the cost may be prohibitive anyway (i.e. apps historically have been one off purchases but a high volume service using Darwin would have a Darwin running cost attached to it).

One thought that does occur to me:  It would be interesting to know about the future road-maps of the NRE vs. the NROD feeds.  The NROD real time feeds seem to be in maintenance mode, I haven't noticed any plans/talk of additional releases/feeds/information being added.  I don't know about the NRE feeds either.

Chris

Peter Hicks

unread,
Feb 3, 2015, 7:00:25 AM2/3/15
to Chris Bailiss, openrail...@googlegroups.com, pe...@retep.org.uk
Hi Chris

On 3 Feb 2015, at 10:49, Chris Bailiss <cbai...@gmail.com> wrote:
Darwin has a forecast element to it, that the NROD feeds don't since they focus on reporting actuals.  Some OD sites like RealTimeTrains make their own forecasts based on current NROD  train running information, past NROD train running information and their own forecasting mechanisms.

The design of TRUST was always to report on “what happened” operationally, rather than “what is going to happen”.  Darwin is the reverse of this - “what happened” isn’t quite as important as “what is going to happen”, which is why Darwin has a whole load more inputs and much more service-related data.

The TRUST and Darwin feeds serve two different purposes - I think they’re both valid, depending on what you’re interested in.

The divisive point is Darwin’s forecasts.  Is it better to have:
  • Darwin - A single, consistent source of predictions that is consistent across all platforms (Station CIS, NRE website, other websites), but is sometimes wrong and errs on the side of caution
  • Multiple different sources of predictions, each differing from each other and wrong in different ways, creating confusion to passengers
My vote is on Darwin - let’s feed back and fix any problems in one place.

One thought that does occur to me:  It would be interesting to know about the future road-maps of the NRE vs. the NROD feeds.  The NROD real time feeds seem to be in maintenance mode, I haven't noticed any plans/talk of additional releases/feeds/information being added.  I don't know about the NRE feeds either.

The situation is interesting, and I’ve been looking in to it.

Network Rail’s Traffic Management (TM - not trademark) programme has some really good technology in it that will know the impact of decisions made by signallers and communicate those in real-time.  For example, if you’re going to run a non-stopping train on the slow line (‘A’) that’s going to be stuck behind an all-stations service (‘B’), Darwin will report the first service as gradually losing time because it has no way of knowing that service ‘B’ is going to be holding up service ‘A’ other than seeing what’s happening.  However, the TM system will be able to predict further in to the future.

My prediction is that TM will supplement Darwin and provide more accurate predictions further in the future and, that Darwin will continue to be the public interface to passenger train movement data.

At the moment, I haven’t heard of any decision to make this data available - but I’ll try to find out if there are plans to make this generally available.

What other data would you like to see from Network Rail?


Peter

signature.asc

Tom Cairns

unread,
Feb 3, 2015, 11:29:47 AM2/3/15
to Chris Bailiss, openrail...@googlegroups.com, pe...@retep.org.uk
It’s certainly an interesting debate on the slightly different solutions that NRE and NROD provides. 

I’ve put a lot of thought into the potential of utilising Darwin information within Realtime Trains and ultimately have leant to the view that it makes little sense and also isn’t practical from a financial point of view. I have a couple of issues with the way Darwin is licensed… there are three issues in my mind (licence, financial costs, and the forecasts – not necessarily in that order):

Ultimately, while NRE proclaim that their data licence is the OGL – practically, it isn’t. The additional access conditions that are placed on the data I feel makes it a too commercially focused policy. I also have a feeling that Darwin by itself doesn’t exactly lend itself to creating products that provide a hugely different perspective on railway operations to the benefit of the passenger. Don’t get me wrong, there will be a few enterprising people who manage to create new interesting and highly beneficial products from it but most will be, near enough, clones of what Darwin already provides. Not that I have anything wrong with that – I think playing with data like this does add value as people do projects which use its benefit on what is best described as a ‘hyperlocal’ basis. I’ve seen sufficient evidence of this personally as Realtime Trains has an API (https://api.rtt.io) and I occasionally look at the product usage from it.

The financial usage costs are clearly commercial leaning and don’t fit with the spirit of open data in providing it ‘at cost’ if a cost is necessary. I calculated a couple of weeks ago, when reviewing usage of RTT and where the costs would land for running with some Darwin information, that I’d be looking at in the region of, per railway period, just short of 21 million requests at the moment (which equates to just a ballpark £85k per year). For that price, NRE don’t offer any support as standard and, as far as I can tell, no SLA on their standard output. There seems to be a way of getting support, and potentially an SLA, but I can’t help but think that this will incur further unsustainable costs. The other problem, of course, is that I don’t make anywhere near that amount of revenue from RTT to even cover a quarter of its cost and, therefore, I can’t entertain the possibility of using any NRE/Darwin data within RTT at present.

The other problem is, of course, the problem of that you have to use Darwin forecasts. I’ve spent quite a lot of time over the last 6 months thinking about this and the route I should take on the Darwin access. Had Darwin been released on a basis that the usage of its predictions were not required but advised (so you could get around issues of insane 12 hour delays at starting stations which have been all too frequent recently) then I likely would have tried to make efforts to investigate it further. The other thing that was coming into my consideration was the issue of Rockshore’s instability… two sources is ultimately better than one for running information. But it’s all come down to that licensing condition in the end – I should be able to pick and choose when I think the ’truthful’ forecast data is simply inaccurate and present data that I have calculated to be closer to the ’truth’, regardless of whether it’s come from the ‘single source of the truth’.

So that’s my reasoning on the situation – it’s not an ideal situation for me but my hand is forced by costs. Instead, I’ve been examining further opportunities to improve my forecasting algorithms. The primary problem right now, if you solely use NROD information, is the lack of short notice alteration information – such as fail to stop/special stop orders which could be provided by a system called Tyrell. Darwin has its own workstation input as well which goes directly into it – so by all means the Tyrell data isn’t complete. I’ve tried seeking direct access with a couple of operators to Tyrell which ultimately got nowhere – primarily on the basis of “why don’t you use Darwin instead?”, a fair question. My understanding is the DfT gave ATOC/NRE funding a couple of years back for specific provision of alteration information – supposedly for the functionality of what is now behind the timetable changes feed. If this is the case, then it would be disappointing to see if NRE licence this on the same condition as the rest of the feeds – given that DfT contributed towards the cost and the Govt push towards fully open data – but I suspect it will be. Given the difficulties of accessing the missing links separately, being able to use the Darwin timetable changes feed would fill the gap for most people as long as it could be provided on a proper open data basis without any onerous NRE licensing conditions on top of it. 

The other side of the problem is, of course, forecasting the future properly. I’ve been developing RTTs algorithms further in recent months to properly calculate likelihoods of overtaking and ‘train following’ which is providing quite reliable results now. The next step is to investigate where S class can play in this situation by understanding further the movements of individual services. Having done a lot of this work, I think there is actually a quite compelling case for not just a single source of the truth. A single source of the truth is great, IMHO, when you can guarantee that the truth is pretty reliable and realistic – Darwin however can be fairly optimistic in some of the scenarios that it forecasts. This does help guarantee that people help make a particular train but there are times when a pessimistic estimate can be more helpful – helpful when managing expectations when the railway struggles at times to do this IMHO. Personally, I’d love to see more services provide alternatives as I think it only furthers the case to compel everyone who does provide forecasts to up their game and increase accuracy. 

Tom

(apologies if this doesn’t make much sense in parts – am quite ill at the moment but felt the need to dive in)

--
Tom Cairns

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Hicks

unread,
Feb 3, 2015, 1:53:59 PM2/3/15
to openrail...@googlegroups.com

On 03/02/15 16:29, Tom Cairns wrote:
The other problem is, of course, the problem of that you have to use Darwin forecasts. I’ve spent quite a lot of time over the last 6 months thinking about this and the route I should take on the Darwin access. Had Darwin been released on a basis that the usage of its predictions were not required but advised (so you could get around issues of insane 12 hour delays at starting stations which have been all too frequent recently) then I likely would have tried to make efforts to investigate it further. The other thing that was coming into my consideration was the issue of Rockshore’s instability… two sources is ultimately better than one for running information. But it’s all come down to that licensing condition in the end – I should be able to pick and choose when I think the ’truthful’ forecast data is simply inaccurate and present data that I have calculated to be closer to the ’truth’, regardless of whether it’s come from the ‘single source of the truth’.
"Single source of the truth" doesn't mean that truth has to be correct - it means it has to be consistent.  The Developer Guidelines at http://www.nationalrail.co.uk/static/documents/Developer_Guidelines_v_03-05.pdf say, in section 2.3.5:

    " If You show train arrival and/or departure predictions in any End User Product that incorporates information derived from Darwin, those predictions must come from Darwin. We insist on this requirement as National Rail Enquiries aim to achieve consistency across all channels in order to expedite industry progress towards the greatest possible level of accuracy. To achieve this, the industry has invested in linking Darwin with a number of rail industry systems to create a centralised source information. Darwin is a core element of the industry’s Customer Information Strategy which is to provide a consistent view of real-time train running information. In support of the Strategy, and as a valued citizen of the developer community, we ask that You contribute by reporting any errors, anomalies and/or omissions in the Feeds to servic...@nationalrail.co.uk so we can fix issues at source."

I think that's fair - letting everyone have access to the data but asking for feedback where the data isn't accurate so it can be fixed centrally.

Would you be willing to share some examples (both general and specific) of where RTT's predictions are closer to reality than Darwin?  I think it'd be of benefit to everyone.


Peter

Tom Cairns

unread,
Feb 3, 2015, 2:27:43 PM2/3/15
to Peter Hicks, openrail...@googlegroups.com
I believe, honestly, that consistency across all channels is not necessarily the answer to the problem. Passengers who seek alternatives are generally intelligent enough to be able to discern the differences and let them choose themselves which they can trust more. RTT doesn’t seek to be a single source of the truth – it never has and it never will – but at the moment it does act as a popular alternative. 

No single system can be guaranteed to be accurate 100% of the time. However, by having more than one source it could actually benefit passengers. If two (or more) sources agree about a particular forecast then the logical probability of it being accurate is higher than if it were just one. If there is just one source, then said source can effectively run unchecked and subject to critique. In a lot of other industries, competition is arguably incentivised this way as it only serves, as I said in my previous message, for everyone to up their game. 

In any case, the general differences that I find between RTT and NRE are from my own personal use of the rail network (which, for personal reasons, has waned significantly recently) and to provide specific examples I’d have to go back through my travel history. I certainly have little time at the moment to go and write something that compares predictive times between the two. Comparing the accuracy between the two isn’t an entirely fair playing field either - in conditions of service disruption, NRE has the upper hand because it has access to more data which those who can’t or don’t use it clearly don’t have access to. I still think that there should be a basis in getting specific short notice service changes out in the open on a fair and reasonable basis that doesn’t involve additional licensing constraints over and above the OGL. 

Either way, most of this is moot as I simply haven’t the desire to investigate the deep inner workings of how Darwin works as I have no want of using it with a price tag of £85,000 per year.

Tom.

--
Tom Cairns

Where-in-Sussex

unread,
Feb 3, 2015, 3:17:18 PM2/3/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Saw these two on Twitter last month:

Chris Bailiss

unread,
Feb 4, 2015, 4:28:56 AM2/4/15
to openrail...@googlegroups.com, cbai...@gmail.com, pe...@retep.org.uk
Hi Peter

Thanks for the reply.

>> The TRUST and Darwin feeds serve two different purposes - I think they’re both valid, depending on what you’re interested in.

Agreed, I wasn't trying to suggest otherwise, just explain the difference in emphasis between the two.

In terms of a road-map for the future, may I first ask:  Is the NROD platform, back at Network Rail HQ, regarded as finished (i.e. in maintenance-mode with no significant enhancements envisaged) or is it regarded as something that will still be expanded on?  I think this is quite a key question as developers consider where to invest their time/money now that two choices are available.  Although NROD / NRE offer different things in different ways with different costs/T&Cs, there is some overlap between them.

>> What other data would you like to see from Network Rail?

1)  Delay attribution information at the level of individual services - either in real time or, if this is attributed later as I guess may be the case, in a separate feed / data extract.  Delays are clearly a very important aspect of how a railway is performing and to not have that data opened up in the Open Data platform is strange.  I have a vague memory there is an explanation of why this is difficult and so wasn't included initially.  I have no problem if there is/was a technical reason why it wasn't done initially - of course you have to start somewhere, but I do think it should be a priority to get it opened up.  
2)  Delay explanation information at the level of individual services - i.e. sounds similar to above, so let me explain.  I assume the official delay attributions are not done in real time, yet, clearly, there is some delay explanation information floating around in the railways IT systems in close to real time.  E.g. the live departure boards include some explanation of why a service is late.  This may not be an official attribution, but it is an (unofficial at that point in time) explanation that is useful to have.  It looks as though this may be in Darwin, but again, it strikes me as though it should be part of the Open Data platform too.
3)  Short notice alterations, as explained by Tom earlier.  I appreciate for many people this would be their first priority, but my use-case is more retrospective reporting, so the above is more important for me.

>> My vote is on Darwin - let’s feed back and fix any problems in one place.
This doesn't affect my use-case as I don't display forecasts, but my view is:  I completely understand the "single version of the truth" point (I work in BI).  I think my approach would have been slightly different to the issue though.  I might have mandated that the pure Darwin forecasts have to be one of the options for displaying the forecast information in apps, websites, etc, perhaps even the default choice, but I would allow it to be used with other complimentary sources of information i.e. those could be switched on by the user, provided the user has made a clear choice to do that.  I might even prescribe in the T&Cs the text that the choice has to include when switching from the default of Darwin forecasts alone - and I might even go a step further and mandate that the user has to repeat this choice at specified intervals (e.g. every 3 months), but ultimately I would allow it.  I appreciate this is a more involved approach and that perhaps the view is this would be too complex for the general public.

>> Network Rail’s Traffic Management (TM - not trademark) programme has some really good technology in it that 
>> will know the impact of decisions made by signallers and communicate those in real-time.
Always interesting to gain a better understanding of systems/developments.  Thanks for highlighting.

>> My prediction is that TM will supplement Darwin
I assume by this you mean it will become another feed into Darwin?  While that makes perfect sense to add for Darwin to improve it's accuracy, I hope some of this information could also be accessed via the OD platform.  Otherwise, I would echo Tom's point, that there is an issue whereby a publicly funded system could only be accessed via a potentially expensive NRE API.  Don't get me wrong, NRE have made some very good and welcome progress in opening up recently, but it strikes me that they are still caught half-way between being fully open (offering services free / at-cost similar to a publicly funded body) and operating commercially - the pricing model of the push-port being the obvious example here.

Chris

Chris Bailiss

unread,
Feb 4, 2015, 4:35:39 AM2/4/15
to openrail...@googlegroups.com, cbai...@gmail.com, pe...@retep.org.uk
To add to my post above... a key question I realised I don't have the answer to:

Is Darwin a Network Rail system or a NRE system?  If it is a NRE system, then it makes sense there is a commercial element to usage of data deriving from that system - though I would still say the National Rail data sources that feed it should be available via the OD platform.

Thanks

Chris

Chris Bailiss

unread,
Feb 4, 2015, 4:49:41 AM2/4/15
to openrail...@googlegroups.com, cbai...@gmail.com, pe...@retep.org.uk
Typo (apologies for the flurry of emails):

>> though I would still say the National Rail data sources that feed it should be available via the OD platform.
...should read...
though I would still say the Network Rail data sources that feed it should be available via the OD platform.

Chris

Tom Cairns

unread,
Feb 4, 2015, 5:43:59 AM2/4/15
to cbai...@gmail.com, openrail...@googlegroups.com, pe...@retep.org.uk
Darwin is an NRE system.

--
Tom Cairns

Peter Hicks

unread,
Feb 4, 2015, 5:45:14 AM2/4/15
to Chris Bailiss, openrail...@googlegroups.com, pe...@retep.org.uk
Hi Chris

On 4 Feb 2015, at 09:28, Chris Bailiss <cbai...@gmail.com> wrote:

> In terms of a road-map for the future, may I first ask: Is the NROD platform, back at Network Rail HQ, regarded as finished (i.e. in maintenance-mode with no significant enhancements envisaged) or is it regarded as something that will still be expanded on? I think this is quite a key question as developers consider where to invest their time/money now that two choices are available. Although NROD / NRE offer different things in different ways with different costs/T&Cs, there is some overlap between them.

As far as I’m aware, there aren’t any plans to add further real-time feeds right at the moment. However, that’s a snapshot of ‘now’, it may change in the future - the Traffic Management data is new and beginning to roll out within NR internally.

That said, the following is probably accurate:

- If you’re looking to do simple departure boards on request, use the OpenLDB Web Service
- If you’re looking to do anything relating to alerting on specific events, such as a train being cancelled, or on a large scale, use the Darwin feeds
- If you’re looking for low-level details which are generally only of interest to industry or advanced users, use the Network Rail feeds

As always, there will be exceptions, so this is a fairly broad brush analysis.

> 1) Delay attribution information at the level of individual services - either in real time or, if this is attributed later as I guess may be the case, in a separate feed / data extract. Delays are clearly a very important aspect of how a railway is performing and to not have that data opened up in the Open Data platform is strange. I have a vague memory there is an explanation of why this is difficult and so wasn't included initially. I have no problem if there is/was a technical reason why it wasn't done initially - of course you have to start somewhere, but I do think it should be a priority to get it opened up.

Individual delay attribution doesn’t happen in real-time in TRUST. When a train is delayed (‘delay’ being defined as a change in lateness - a 5 minute delay at origin will result in the train running 5 minutes late) by three or more minutes, a system called TRUST DA (Delay Attribution) creates a Delay Alert for the train, which is then attributed to an existing incident or, if it’s something like station overtime, a new incident is created.

DA is fairly broad-brush but also very detailed. It’s definitely not real-time, and it’s a contentious business as the industry looks to attribute delay to the right ‘responsible manager code’ (four characters). There’s also free text in incidents, but this isn’t in a format suitable for public consumption as it may contain industry jargon, personal information and/or simply be inaccurate when it’s first entered.

From a technical standpoint, there’s not much in the way. From a political/industry standpoint, getting approval of everyone concerned and ensuring the data is analysed properly is difficult - especially when that data can change and isn’t designed for real-time analysis.

Darwin contains, if it’s provided, train-specific delay information in standardised format. For example, “This train has been delayed due to a broken down train”. In the Delay Attribution world, ‘broken down train’ may not actually appear anywhere - ‘failed loco’, ‘hydraulic fluid leak’, ‘AWS failure’ can all result in the same end result.

It may be possible to get historical, post-incident, sensitive-data-removed delay information published, as this is held in a data warehouse and, I believe, could probably be exported fairly easily. However, that’s historical and doesn’t tick a real-time box.

> 2) Delay explanation information at the level of individual services - i.e. sounds similar to above, so let me explain. I assume the official delay attributions are not done in real time, yet, clearly, there is some delay explanation information floating around in the railways IT systems in close to real time. E.g. the live departure boards include some explanation of why a service is late. This may not be an official attribution, but it is an (unofficial at that point in time) explanation that is useful to have. It looks as though this may be in Darwin, but again, it strikes me as though it should be part of the Open Data platform too.

If the reason for a delay has been entered in to CIS at a location and that information is push-able from said CIS to a central point (and not all information is, although many CIS are connected in to Darwin, so conceptually it could be transmitted) and available in a standardised format, then I agree, it’s useful. However, from a Network Rail point of view, it’s relating to train services, not running the infrastructure, so should probably be available in Darwin itself.

There’s also other service information, such as train length, orientation, “front four coaches for…”, “1st Class is at the rear”, which is held within station CIS and isn’t open. That data will come from TOCs, so should probably sit in Darwin as it’s passenger-focussed, not ‘running the network’-focussed.

> 3) Short notice alterations, as explained by Tom earlier. I appreciate for many people this would be their first priority, but my use-case is more retrospective reporting, so the above is more important for me.

If an alteration is input directly in to Darwin, then it’s available through the Push Port and other Darwin-powered platforms. I don’t see any reason you couldn’t pick alterations to the public-facing side of things from Darwin and tie them up with VSTP messages received from Network Rail to paint a picture of the railway as a whole, not just trains run by TOCs (as opposed to freight operators, or engineering trains).

Tyrell, which is much talked about as a silver bullet, is only as good as the data input to it. Whilst it does feed in to Darwin and specific service-related messages could also feed in to other systems (e.g. cancelled stops, additional trains, advance notice of delays/diversions), this data will also probably be available through the Darwin Timetable feed - I’ll need to check. That’ll include both Tyrell-sourced alteration data *and* data input direct in to Darwin.

> >> My vote is on Darwin - let’s feed back and fix any problems in one place.
> This doesn't affect my use-case as I don't display forecasts, but my view is: I completely understand the "single version of the truth" point (I work in BI). I think my approach would have been slightly different to the issue though. I might have mandated that the pure Darwin forecasts have to be one of the options for displaying the forecast information in apps, websites, etc, perhaps even the default choice, but I would allow it to be used with other complimentary sources of information i.e. those could be switched on by the user, provided the user has made a clear choice to do that. I might even prescribe in the T&Cs the text that the choice has to include when switching from the default of Darwin forecasts alone - and I might even go a step further and mandate that the user has to repeat this choice at specified intervals (e.g. every 3 months), but ultimately I would allow it. I appreciate this is a more involved approach and that perhaps the view is this would be too complex for the general public.

This sounds like a good half-way house to me but playing devil’s advocate, there are cases where this might not be appropriate - such as if the information is presented in a means the user can’t easily change, such as on a public display in a pub next door to a station.

However, looking at it from a different point of view - how consistent are other people’s forecasts when compared to Darwin, and how consistent are Darwin’s forecasts when compared to actuals? I’d really like this to be out in the open so evidence can speak for itself. RealTimeTrains has been wholly incorrect for me in cases where Darwin has been correct - for example, advertising a Bedwyn train from platform 1 at Paddington when the station CIS said a higher numbered platform - but that’s because, as I understand it, Tom’s system works off train describer inputs and if they’re wrong (because the signaller does not always known about a set swap as far in advance as the station), then it’s presenting a different point of view, one which could cause conflict at the station if staff have to explain why a non-Darwin powered system is showing different information to Darwin. For the record, OpenTrainTimes’ map of Paddington also showed the train at platform 1.

This is not a dig at Tom, nor a claim that Darwin is always right - but comparing the two side by side (for example, taking several months of RealTimeTrains’ predictions against Darwin’s predictions and comparing them) will show, I hope, that Tom’s solved some specific problems in forecasting the Darwin has, but that Darwin is also more accurate than RTT in other ways.

Ultimately, if incorrect information is sent out of Darwin, then it’s an industry problem to solve, rather than a problem with one person’s interpretation of what might happen. I’d rather shift the ‘blame’ for my site displaying incorrect information to somebody else, if they’re willing to accept it :-)

> >> My prediction is that TM will supplement Darwin
> I assume by this you mean it will become another feed into Darwin? While that makes perfect sense to add for Darwin to improve it's accuracy, I hope some of this information could also be accessed via the OD platform. Otherwise, I would echo Tom's point, that there is an issue whereby a publicly funded system could only be accessed via a potentially expensive NRE API. Don't get me wrong, NRE have made some very good and welcome progress in opening up recently, but it strikes me that they are still caught half-way between being fully open (offering services free / at-cost similar to a publicly funded body) and operating commercially - the pricing model of the push-port being the obvious example here.

My strictly personal view is that the TM data should be available from Network Rail as a ‘view of the railway’, and that data should be available through other systems, augmented with additional passenger-facing data, on whatever terms are most suitable to the organisation.

It would be absolutely brilliant for NRE to make all their data free at the point of use and available for whatever purpose people want to use it - but looking at it pragmatically, evolution is better than trying to get everything ‘right first time’, and a commitment to evolving and reviewing things as they happen could be seen as more important than trying to sort everything out before anyone’s using data.


Peter

signature.asc

Martin Swanson

unread,
Feb 4, 2015, 6:13:25 AM2/4/15
to Peter Hicks, Chris Bailiss, openrail...@googlegroups.com, pe...@retep.org.uk
My reservation about Darwin is it is controlled by the ATOC. It is therefore influenced by public companies. I notice there is still a lot of useful data that is still kept private - routing guide, delay attribution, ticket pricing etc. I'm not sure you can really call this open data - it is data taken from Network Rail, processed by a body controlled by public companies, and then licensed under terms that don't seem wholly attractive

What I would like to see is a full roadmap to enhance and develop the Network Rail / NROD feed to ensure there is a viable open source alternative to Darwin.

I also think this community could do more to build common services to support NROD, to rather than solving the same problems individually (ref data, locations, queue infra etc).

Peter - have you thought about creating a GitHub project for us all to contribute to?

Martin
> --
> You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
> To post to this group, send an email to openrail...@googlegroups.com.

Tom Cairns

unread,
Feb 4, 2015, 6:16:09 AM2/4/15
to Martin Swanson, Peter Hicks, Chris Bailiss, openrail...@googlegroups.com, pe...@retep.org.uk
It’s probably worth pointing out that ticket prices are actually out in
the open via RSP (http://data.atoc.org) albeit only updated per fares
manual release I believe. The missing bit on that one is the lack of
access to NRS to verify the availability of advance tickets.

Tom

--
Tom Cairns

Martin Swanson

unread,
Feb 4, 2015, 6:20:58 AM2/4/15
to openrail...@googlegroups.com
A GitHub project would let us build a backlog of feature requests - including data we want opened up - as well as share code.

Any takers?

Peter Mount

unread,
Feb 4, 2015, 6:26:19 AM2/4/15
to Martin Swanson, openraildata-talk

My codebase is already on github so I don't see why not - at least tracking requests & shared code would be a good idea.

Chris Bailiss

unread,
Feb 4, 2015, 7:03:54 AM2/4/15
to openrail...@googlegroups.com, cbai...@gmail.com, pe...@retep.org.uk
Hi again Peter

Thanks for the detailed reply.

Re: the Delay Attribution Process/Data - your description of the process makes sense.  I guessed (or perhaps vaguely remember from the last time it came up on here) that this was the case.  That's why I specifically split attribution and unofficial close-to-real-time explanation into two separate items.  Your explanation makes it even clearer that these are two different sets of data - i.e. attribution being more official and more carefully considered whilst the close-to-real-time explanations are data specifically about running train services and so different.

I guess that a key point is also:  who owns and manages the attribution process? And so has the master data on the subject?  I had (perhaps wrongly) assumed it would be Network Rail as the closest thing to a national public body, however, as many of the delays would be attributed to them, perhaps that isn't the case.

>> It may be possible to get historical, post-incident, sensitive-data-removed delay information published, as this is held in a data warehouse 
>> and, I believe, could probably be exported fairly easily.  However, that’s historical and doesn’t tick a real-time box. 

The Delay Attribution data clearly isn't fully (or even mainly) real time by the nature of the attribution process.  Given how involved that process is, it is also perfectly understandable that the attribution for a particular delay could be revised later, up to (I guess) a certain cut-off time when the attribution is presumably set.  All in all, I would still like to have access to that data since delays are clearly an important aspect of the railway (e.g. in the form of an extract that could be run multiple times to support the revision process).  So, an export as you suggest may be technically possible, sounds like a reasonable place to start.

>> From a political/industry standpoint, getting approval of everyone concerned and ensuring the data is analysed properly is 
>> difficult - especially when that data can change and isn’t designed for real-time analysis.

I can indeed imagine this to be the larger stumbling block.  However, I don't think it is sufficient just to say because it is contentious / difficult the data should be not open.  I would hope that having an extract (e.g. daily or weekly) that covers the last three or six months so supporting the revisions as the data evolves would be a sensible starting point.

But, as I said earlier, I guess a key part of this whole discussion is who manages the attribution process and thus has the master data on the topic?

>> If you’re looking to do anything relating to alerting on specific events, such as a train being cancelled, or on a large scale, use the Darwin feeds

This makes some sense from the single truth / it is clearly a reasonably good source of forecasting point of view.  However, there is something of a contradiction and inherent tension in the industry recognising / driving / mandating Darwin whilst simultaneously applying a charging model over the top that for very popular applications will be prohibitively expensive.

>>>> I think my approach would have been slightly different to the issue though.  I might have mandated that the pure Darwin forecasts 
>>>> have to be one of the options for displaying the forecast information in apps, websites, etc, perhaps even the default choice, but I 
>>>> would allow it to be used with other complimentary sources of information
>> This sounds like a good half-way house to me but playing devil’s advocate, there are cases where this might not be appropriate
>> such as if the information is presented in a means the user can’t easily change, such as on a public display in a pub next door to a station.

Fair point but one that could probably be managed via T&Cs.

>> (RealTimeTrains for example has) solved some specific problems in forecasting the Darwin has, but that Darwin is also more accurate than RTT in other ways.

Again, I would agree this is likely to be the case.  And I would suggest this is probably always likely to be the case, and is therefore a reason in favour of allowing different forecasts.  It is possible that RTT (for example) has solved some specific issues in certain local areas, therefore users in those areas would in particular find an alternative application utilising both sources useful.  I do agree too that feed back about Darwin is important to allow it to be improved in those areas but it is likely individuals will be able to innovate / improve quicker.

>> My strictly personal view is that the TM data should be available from Network Rail as a ‘view of the railway’, and that data should be available 
>> through other systems, augmented with additional passenger-facing data, on whatever terms are most suitable to the organisation. 

Again, I agree, but it is difficult to drive/mandate a particular system such as Darwin as a single point of truth for certain data when there is also a cost model working against that.  This is to some extent a reflection of the privatisation / fragmentation of the railway.  At least, as you say, the industry and NRE and starting to open-up and perhaps the cost model will evolve.

Chris

Mike Flynn

unread,
Feb 4, 2015, 7:09:41 AM2/4/15
to openrail...@googlegroups.com, pe...@retep.org.uk
I started out with national Rail's Darwin making good use of their LDB service. Licencing, documentation, commumication was a constant struggle and until recently I wasn't able to monetize sites using it.  So I was keen to augment my data sources with the various available from Network Rail.  I was already using CIF but was particularly keen to be in a position to replace Darwin.  To this end, and realising I needed to be running at least a VPS rather than my existing shared accounts, I decided to first concentrate on upgrading my hosting facilities. Including other work I was involved with I put on hold any development work with the Network Rail feeds.  

That was a while back and now things have moved on.  I can run adverts on my sites.  Communication with National Rail seems to be better: to the point they almost seem to be willing partners.  (Well, almost!  Personally, I think they see the 'competition' out there.  Now they're moving towards a situation where they are to be single source of passenger information data.  And they're opening up new revenue streams.)  I don't believe the cost to be a barrier to me: I average about 0.5p of income per pageview which is 10 times more than the cost of Darwin queries.  And there are new feeds and services available.  For me, with the limited resources I have, my best way forward is to concentrate on the existing and proposed feeds and services from Darwin.  

I've largely been a bystander regarding this forum but do mostly read the posts and am aware of the various developers out there.  Those already working with the Network Rail feeds (I think most of you) are undoubtedly at a crossroads.  Which way to go?  One, the other or both?  I think ATOC/National Rail's Darwin is going to win this one.  I do also think there is a place for more than one system but unless it can be kept cutting edge, provide useful additional functionality and find a marketplace other than for passenger information I think it'll struggle. I agree with Tom Cairns, data from a single source though displayed in various devices and formats can and will become somewhat homogenous.  But I think that's the point!  You know, it wasn't always standard guage.  

To some of us it's a passion or hobby, to others a business.  Either way I'm glad to say again open data, not least as regards UK rail, is evolving for the better (though some might still disagree).

Tom Cairns

unread,
Feb 4, 2015, 7:44:09 AM2/4/15
to cbai...@gmail.com, openrail...@googlegroups.com, pe...@retep.org.uk
I am in no doubt that there are issues with some data that RTT gives out, but most of these issues can be borne down to three primary situations:

1) lack of short notice changes (fail to stop/special stop order information/cancellations/additional services)
2) platform changes at origins that aren’t told to signallers before they’re advertised (although TBH, IME this is quickly resolved)
3) extended station dwell times due to whatever reason (e.g. crew changes and crew not available)

All three of these go down to a manual, human involved, process at present – and they’re not even automated forecasts really, just impact upon them. Pretty much all others I am confident can be resolved in code using whatever algorithm necessary. I believe that fail to stop is part of the delay attribution framework within TRUST but much of the rest of it is passenger related (despite having an operational impact) so aren’t available from Network Rail. Although I think that NR should certainly have all of these available to them as they ultimately do run the infrastructure and the signalling…

I can see the desire of many to not want to go down the route of rolling their own systems for predictions but it is an interesting challenge and one that some will find interesting (I certainly do). Of those that find it an interesting challenge and make it public, some may well stand by their predictions subject to the restrictions under which they can be made due to a lack of acceptable data sources. I sit in this camp and I’m fairly open about what can and can’t be done – it’s listed on a page on the website.

As for improving forecasts, the lack of data issue to sort out those three above issues has meant that I’ve been concentrating on looking at the forecast issues caused by trains interacting with each other. I genuinely find this kind of thing very rewarding – and the results are very promising. But, again, unless those three above bits of information can be released on a properly open basis then ultimately it does lean towards an eventual conclusion of what I am doing with RTT right now if it can’t do some, what I think are, very simple things. If the railway is interested in ensuring that everyone is singing off the same hymn sheet, then it should do its best to release the bits that are requested. 

Much of the cost of Darwin seems to be related to its operation and development, but the three issues highlighted all originate from TOCs (or Network Rail, particularly in the example of 1 & 2) and have little to do with Darwin’s forecasting system at the end of the day, so the cost basis of providing such information should be minimal from whichever source. I went to a day at the ODI a couple of months ago about railways and there was a talk from one of the ODI people about how to account for cost of providing such data: he summarised that releasing data cannot be costed in what would be considered typical methods. It went along the lines of that the financial cost from releasing such data could be considered in terms of understanding net benefit to the economy and general public and that if a cost is required, it should be at cost or less. 

If NRE feel they need to charge for releasing their forecasting to high volume users, then so be it, as I know full well from my own experience that it requires a fair amount of compute power. However, unless they are charged through the nose for data like the issues stated, and if they are that should be investigated as the TOCs provide a public service - there shouldn’t be a huge cost in providing information about it, I really don’t see why this can’t be distributed separately under terms that are more in spirit of open data.

Tom.

--
Tom Cairns

Tom Cairns

unread,
Feb 4, 2015, 8:22:56 AM2/4/15
to Mike Flynn, openrail...@googlegroups.com, pe...@retep.org.uk
On 04/02/2015 12:09, "Mike Flynn" <mi...@a1publishing.com> wrote:

That was a while back and now things have moved on.  I can run adverts on my sites.  Communication with National Rail seems to be better: to the point they almost seem to be willing partners.  (Well, almost!  Personally, I think they see the 'competition' out there.  Now they're moving towards a situation where they are to be single source of passenger information data.  And they're opening up new revenue streams.)  I don't believe the cost to be a barrier to me: I average about 0.5p of income per pageview which is 10 times more than the cost of Darwin queries.  And there are new feeds and services available.  For me, with the limited resources I have, my best way forward is to concentrate on the existing and proposed feeds and services from Darwin.  

About a year ago I probably would have agreed with you about average revenue per page view and that using Darwin would be viable. However, with any online service funded by advertising I have found there is a point at which an effective cap to growth rates starts and it only increases very gradually from thereon in unless you go nuts with the ad placements. I wrote RTT first and foremost for myself and therefore I try to limit advertising to the absolute minimum where I can – it lends to a cleaner design and it’s easier to find the information you want without being distracted. 

I've largely been a bystander regarding this forum but do mostly read the posts and am aware of the various developers out there.  Those already working with the Network Rail feeds (I think most of you) are undoubtedly at a crossroads.  Which way to go?  One, the other or both?  I think ATOC/National Rail's Darwin is going to win this one.  I do also think there is a place for more than one system but unless it can be kept cutting edge, provide useful additional functionality and find a marketplace other than for passenger information I think it'll struggle. I agree with Tom Cairns, data from a single source though displayed in various devices and formats can and will become somewhat homogenous.  But I think that's the point!  You know, it wasn't always standard guage.

I don’t see there as being a competition, so to speak, between NRE and any other given system. Each has its own facets of which it is better and which is worse. Personally A homogenous environment isn’t a bad thing but I don’t think it lends itself to much innovation in the ‘realtime future’ side of things. There is a vast amount of untapped potential in the NROD feeds, and likely in NRE's feeds too. One thing from the NRE feeds in particular is that I feel a service that visualises and distributes disruption incidents effectively is just screaming out to be made – and that doesn’t just mean a heatmap… I always wonder “what does this disruption incident actually mean to me?” and I don’t think just a text alert really fits the bill.

In some ways, I think that TfL have got this right. Provide the data and people will come and develop the services that you want to make, but at no cost to you other than the data. Effectively, by doing it, they’ve saved money!

Tom

George Goldberg

unread,
Feb 4, 2015, 8:52:21 AM2/4/15
to openrail...@googlegroups.com
I've been observing this thread with a great deal of interest ever since Matt initially asked about the differences between NROD and Darwin, as it's a question that's been on my mind for a few months.

The conclusions I've come to myself are also very similar to Tom's. Initially, the Darwin and other NRE feeds seem very appealing, but on closer examination of the terms and conditions attached, this appeal rapidly dissipates.

The first issue is around cost. Although the "per request" pricing looks small, it actually adds up very quickly to become a substantial amount of money when serving any serious number of end users. It is also very clearly designed with the existing typical use cases in mind (people look up a train, or check the live departure boards for their station), and this means when considering possible novel ways to make use of the data feeds, that may be considerably heavier in terms of the volume and frequency with which they consume data, the effective cost often turns out to be prohibitive.

I can fully understand the need to recover costs, and have no issue with that. However, this takes me on to the pricing of the push port. With charges calculated based on requests between the developer's system and end users, this is clearly not directly reflecting the costs to NRE of providing that data, as this cost does not change relative to the number of end users of the developer's products.

Then there is the issue of the restriction on using alternative forecast data. In a truly "open data" situation, I can understand and even perhaps support the idea of "let's all work together to make one single high quality source of information". However, when that one source of data is clearly a commercial concern, with licensing terms and pricing to match, I do not find that argument convincing at all.

If I were to make a suggestion on how I would change this, it would be:
a) remove the condition limiting the use of NRE forecasts alone, and perhaps replace it with a requirement that it be made clear to end users whether the forecasts are the "official NRE one" or "unofficial".
b) make the push port free regardless of volume, or if a charge really must be levied, a fixed cost for access reflective of the cost to provide the service.


--

George

Chris Northwood

unread,
Feb 4, 2015, 9:22:04 AM2/4/15
to openraildata-talk
Whilst it's all admirable for Darwin wanting to be the single source of truth, the only sustainable way for this to be the case would be if Darwin itself was developed as an open-source product, which would allow for all interested parties to contribute to the accuracy of the algorithms, rather than the potentially fragmented future we have now. Whilst it's closed source, there'll always be people interested (and possibly successful) in doing a better job, but they'll always be hampered by the privileged position Darwin has re additional feed access.

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.

Where-in-Sussex

unread,
Feb 4, 2015, 9:29:40 AM2/4/15
to openrail...@googlegroups.com, pe...@retep.org.uk
As a daily user of TRUST for delay attribution (and I make these comments from a personal level and not in relation to my employment) I would like to see a TRUST DA feed that has the following info:

Allocation of a delay code to a 'trainid', with the delay/reaction minutes, trains, FTS etc from the bottom section of a TRJF screen (those who use TRUST will understand that!)
Delay code and title
Responsible train
Disputed Y/N

This way the text fields are not sent (no personal information) but we have a delay code and presumably every time part of that info was updated, the TRUST feed would send out a new message.

All the above is information that Network Rail/TOCs quite happily distribute when 3rd parties cause trouble ("so far over xxx trains have been cancelled and xxxx minutes of delay caused due to the burst water main flooding the tracks....").  It would reduce the amount of speculation, as currently if passengers hear of an Operational Incident and see a TG code somewhere, it "must" be a SPAD (and couldn't possibly be a TPWS OSS activation).

The biggie for me is something I get asked regularly, and I'm pretty sure Tom gets plagued by (although, admittedly, of zero passenger benefit) is the freight codes obscurification of freight codes.  As pointed out in this group, the majority of codes are publicly available, and there is nothing to stop somebody parsing these and matching them to a start time and location for P coded schedules.  Not something I'm interested in doing, but you see my point.  I'm not even sure what the original reason was for the obscurification, but the fact any train spotter can tell you all about pretty much any service on the network, I, personally, see it as an annoyance.

Chris Bailiss

unread,
Feb 4, 2015, 9:37:28 AM2/4/15
to openrail...@googlegroups.com
>> If I were to make a suggestion on how I would change this, it would be:
>> a) remove the condition limiting the use of NRE forecasts alone, and perhaps replace it with a requirement that it be 
>> made clear to end users whether the forecasts are the "official NRE one" or "unofficial".
>> b) make the push port free regardless of volume, or if a charge really must be levied, a fixed cost for access reflective 
>> of the cost to provide the service.

To pull some of the threads from my earlier messages together, I think inherent in all this is the fact Darwin is a NRE system:
1) In one sense, NRE being a commercial body are perfectly entitled to charge commercially for their information services.  
2) However Darwin is becoming the industry designated / mandated (somewhat privileged) single truth for public train time forecast information.  
There is a tension / partial contradiction between these two facts once you reach a certain usage volume, because of the charging model - they are pulling somewhat in opposite directions.

At the low-usage end, the service is free (i.e. clearly less than cost) which certainly reflects an attempt to offer a public service, which is commendable.
At the high-usage end, some rethinking is required.  
It all comes down to whether the output of Darwin is viewed as primarily as public information, primarily for the benefit of the travelling public, or not.  

It would be fair of NRE to try to recoup some of their costs of distributing the Darwin information (particularly if it is being used commercially) but not all uses are commercial.

As Peter mentioned earlier, this is all still early days, so hopefully the picture will continue to evolve.


Peter Hicks

unread,
Feb 4, 2015, 9:40:22 AM2/4/15
to Where-in-Sussex, openrail...@googlegroups.com, pe...@retep.org.uk

On 4 Feb 2015, at 14:29, Where-in-Sussex <swe...@gmail.com> wrote:

> As a daily user of TRUST for delay attribution (and I make these comments from a personal level and not in relation to my employment) I would like to see a TRUST DA feed that has the following info:
>
> Allocation of a delay code to a 'trainid', with the delay/reaction minutes, trains, FTS etc from the bottom section of a TRJF screen (those who use TRUST will understand that!)
> Delay code and title
> Responsible train
> Disputed Y/N

I’ll see if I can resurrect the discussions about DA-related messaging. It may come to nothing, but we will get to a state where we either:

* Make a case for having the information made public, albeit in a brief form
* Have a definitive reason why the information can’t be made public

> The biggie for me is something I get asked regularly, and I'm pretty sure Tom gets plagued by (although, admittedly, of zero passenger benefit) is the freight codes obscurification of freight codes. As pointed out in this group, the majority of codes are publicly available, and there is nothing to stop somebody parsing these and matching them to a start time and location for P coded schedules. Not something I'm interested in doing, but you see my point. I'm not even sure what the original reason was for the obscurification, but the fact any train spotter can tell you all about pretty much any service on the network, I, personally, see it as an annoyance.

In the first iteration of Open Data, it was only Class 1 and 2 headcodes which came through in the TRUST and TD feeds. I made the case for having other trains and for freight trains to have their headed obfuscated - because knowing there’s *a* train there is more important than knowing exactly *what* train it is.

I agree that the obfuscation isn’t 100% secure - but nobody could come up with a suitable mechanism to make it 100% secure. The closer you try to get to 100% obfuscation, the further away you get from being able to make sense of the data. Putting a headcode of ‘FRGT’ in the TD and TRUST feeds would say there’s a freight train there, but not where it’s going - and you need to know where it’s going to know whether a train’s going to be stuck behind it.

On a positive side, if individual FOCs are happy to make their own data public and tell NR that, there’s no technical reason why (and the capability exists) to make this data public.

I’d suggest that anyone who wants freight data to go out ‘in the clear’ for any FOC gets in touch with that FOC and asks them to tell NR that they’re happy for their data to go out in the clear.


Peter

signature.asc

Chris Bailiss

unread,
Feb 4, 2015, 9:48:25 AM2/4/15
to openrail...@googlegroups.com, swe...@gmail.com, pe...@retep.org.uk
The DA data Where-in-Sussex suggested makes sense and was roughly what I had in mind too (plus some additional fields I didn't know existed).

A DA feed would clearly be ideal, however, if that isn't deemed viable/economic, then I would gladly accept a regular (e.g. daily or in the worst case weekly) data extract containing that information.

>> Make a case for having the information made public, albeit in a brief form
By in a brief form, I assume you mean just without the free text fields?  Or can you already see some other data that wouldn't be publishable?

>> Have a definitive reason why the information can’t be made public
Surely delay information has to be one of the short/long-term aims for the OD platform?
BTW, in one of my longer earlier posts I was wondering who owns/operates the DA (is there one organisation that manages it?, in the event of disputes who decides?).
Where is the data mastered?  From Where-in-Sussex it sounds like Trust is the system?  (or a higher level extract from a Data Warehouse somewhere...?).

Chris

Mike Flynn

unread,
Feb 4, 2015, 10:21:03 AM2/4/15
to openrail...@googlegroups.com, pe...@retep.org.uk
>> I don’t see there as being a competition, so to speak, between NRE and any other given system. Each has its own..

You are their competition!  But they're not going to backed into a corner and/or embarrassed by some pip-squeak developers (and I include myself in this latter category). 

There is a massive amount of time and investment developing some of our systems and they maybe don't yet appreciate this. And they're often more advanced than Darwin.  But the elephant in the room IS NRE.  They've always provided the public the train times and they're not going to give it up.  And let's not forget they've also invested heavily.  Think how many mortgages it covers?!

>> Whilst it's all admirable for Darwin wanting to be the single source of truth, the only sustainable way for this to be the case would be if Darwin itself was developed as an open-source product

I agree and I think it is going this way.  Note their recent interaction with this group for example.  But as we all know NRE are, unlike TfL, rightly or wrongly, particularly protective of their data and I don't see this changing any time soon.  I think the 'passionate' developers might continue with Network Rail data but don't, for myself, at the present time, see this route as commercially viable.  Any advances, ie., more accurate passenger information, will of course be incorporated into the Darwin system over time anyway.

>>  ..unless you go nuts with the ad placements..
Another subject but, yes, by and large, the more ads the lower the customer satisfaction.  But you'll also be aware, many, many other factors involved.  At least as interesting a subject as the railways I think!  But I can still make a good profit.  And if paying for it makes it less likely the rug pulled from under me the better.  A constant worry for me, and I guess many of you too.

Tom Cairns

unread,
Feb 4, 2015, 10:31:37 AM2/4/15
to Chris Bailiss, openrail...@googlegroups.com
I think the argument of NRE being a commercial body only works up to a point. NRE was certainly previously funded by the train operators, who in part were funded by Government subsidy and indeed by the fare paying passenger. NRE were reaching a goal of being self sustaining (through advertising revenue) and I’m not sure if they have reached that yet – probably have, but Darwin was certainly initially developed under conditions that they were indirectly receiving Government and passenger money.

Understanding where exactly its primary funding comes from is complicated as, when I’ve looked at Train Information Services Limited's accounts before, there are numerous loans all over the shop between ATOC Ltd/Rail Settlement Plan Ltd/Rail Staff Travel Ltd/NRES Ltd/TISL Ltd. 

--
Tom Cairns

Peter Hicks

unread,
Feb 4, 2015, 1:30:46 PM2/4/15
to openrail...@googlegroups.com
Hi Chris

On 04/02/15 14:48, Chris Bailiss wrote:
> The DA data Where-in-Sussex suggested makes sense and was roughly what
> I had in mind too (plus some additional fields I didn't know existed).
There are a whole heap more, such as STANOX section (pair of STANOXes
between which the delay was incurred) or STANOX if it was over-time at a
station. I haven't got the COBOL copybook to hand, but there are a
number of different messages each with a fairly large set of fields.
> A DA feed would clearly be ideal, however, if that isn't deemed
> viable/economic, then I would gladly accept a regular (e.g. daily or
> in the worst case weekly) data extract containing that information.
I think it's a case of determining:

* Is the data useful? Who will find it useful, and what will it
enable them to do?
* Is it in the public interest to release the data? Does it tell a
story effectively, or does it need other data or prior knowledge?
* Is it politically feasible to release the data? Who need to sign
off on releasing the data?
* Is it technically feasible to do? Is the data complete? Where are
inaccuracies documented?
* What damage could releasing the data do? (for example, if somebody
misinterprets the data) How do you mitigate it?
> Surely delay information has to be one of the short/long-term aims for
> the OD platform?
The Government's mandate to NR a few years ago was "Open up your
real-time data" and, at the time, it was the TRUST, TD, VSTP, RTPPM and
TSR feeds that I already had access to, because the value of those was
clear.

I can't comment on what the short and long-term aims for the platform
are, because I don't know and haven't looked/asked - however, I will
suggest to NR that they make a statement on what the short and long-term
aims of the Data Feeds platform are (from my point of view of "user of
the feeds" rather than any other).

One thing that springs to mind - maybe we can have a list of data and
datasets which *can't* be published and why - in the spirit of
transparency, knowing the "why not?" shows there's nothing to hide. It
also means that, should somebody come up with a way to get around any
identified risks (and I've found many that I can't argue because they're
fair and reasonable), there's the opportunity to suggest and make change
happen.
> BTW, in one of my longer earlier posts I was wondering who
> owns/operates the DA (is there one organisation that manages it?, in
> the event of disputes who decides?). Where is the data mastered?
> From Where-in-Sussex it sounds like Trust is the system? (or a
> higher level extract from a Data Warehouse somewhere...?).
TRUST DA is a mainframe-based system, owned by Network Rail. Like the
several of the mainframe-based industry systems, Atos Worldline operate
it on behalf of Network Rail - it's shared across all operators. So, as
with TRUST, it's Network Rail's data.

I'm not sure how you define 'mastered', but TRUST DA generates delay
alerts, and requires they be attributed in TRUST DA - it then sends data
out to other systems, such as the PSS Data Warehouse (extracts of which
I think should be released) and the data is used for a variety of
reporting purposes.

Being pragmatic, I think the likelihood of having bulk data extracts of
incident and delay data from the PSS Data Warehouse on a weekly or daily
basis is a lot higher than having real-time DA message feeds available.
I'll pass this information on - I looked at it briefly a few months ago,
but we didn't come up with any conclusion on how we could do it reliably.


Peter

Peter Hicks

unread,
Feb 4, 2015, 2:27:03 PM2/4/15
to openrail...@googlegroups.com

On 04/02/15 14:21, Chris Northwood wrote:
> Whilst it's all admirable for Darwin wanting to be the single source
> of truth, the only sustainable way for this to be the case would be if
> Darwin itself was developed as an open-source product, which would
> allow for all interested parties to contribute to the accuracy of the
> algorithms, rather than the potentially fragmented future we have now.
> Whilst it's closed source, there'll always be people interested (and
> possibly successful) in doing a better job, but they'll always be
> hampered by the privileged position Darwin has re additional feed access.
I'm not sure this stands up. It would be great if Darwin were Open
Source, but what would be the benefit? Who would be technically
competent enough to contribute to improving the algorithms and why don't
they work for the companies that run Darwin? How would the quality of
the whole product be assured? Would making Darwin Open Source
automatically mean the additional feeds in to Darwin must be opened up too?

I think there will always be data you can't easily get your hands on,
and we could argue forever about it. For example, Emergency Speed
Restrictions (ESRs) aren't available on the NR Open Data platform, as
they're sent out as free text by other means.


Peter



Mike Flynn

unread,
Feb 5, 2015, 9:55:00 AM2/5/15
to openrail...@googlegroups.com, pe...@retep.org.uk
> Whilst it's all admirable for Darwin wanting to be the single source of truth, the only sustainable way for this to be the case would be if Darwin itself was developed as an open-source product..

>> I agree and I think it is going this way... 

>>> I'm not sure this stands up..

Of course it wouldn't.  What was I thinking?  I was actually thinking NRE might evolve more towards the back-end.  I'm sure their website currently generates a large revenue but I see their share dwindling over time.  It's this area where they have not been able to keep hold of their monopoly.

Nigel Mundy

unread,
Feb 5, 2015, 11:03:04 AM2/5/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Regarding the obscuring of Freight headcodes:
" On a positive side, if individual FOCs are happy to make their own data public and tell NR that, there’s no technical reason why (and the capability exists) to make this data public. 

I’d suggest that anyone who wants freight data to go out ‘in the clear’ for any FOC gets in touch with that FOC and asks them to tell NR that they’re happy for their data to go out in the clear. 
"

Would NR be willing to lead by example - ie their Sandite (or however the MPV is fitted out that day)?     I would think that Engineering trains are also "NR is the FOC", even though the motive power and crews are usually sourced from a FOC.   Once the precedent is set, then often others will follow like sheep.

/\/igel

Peter Hicks

unread,
Feb 5, 2015, 11:16:50 AM2/5/15
to Mike Flynn, openraildata-talk
Hi Mike

On 5 Feb 2015, at 14:55, Mike Flynn <mi...@a1publishing.com> wrote:

> Of course it wouldn't. What was I thinking? I was actually thinking NRE might evolve more towards the back-end. I'm sure their website currently generates a large revenue but I see their share dwindling over time. It's this area where they have not been able to keep hold of their monopoly.

Please - let’s try to keep a positive attitude here.


Peter

signature.asc

Mike Flynn

unread,
Feb 6, 2015, 5:44:29 AM2/6/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hmm, I didnt' intend to be negative and apologies if I didn't get my point across clearly.  

So, following privatisation, Network Rail was set up but didn't run any trains thus avoiding any conflict of interest. I see a similar change happening where National Rail Enquiries become the supplier of rail information data.  The problem as I see it is this current conflict of interest.  They are now being asked to provide data and services to their direct competitors.  And there IS a lot of money involved.  Train times are one of the highest googled search terms.

I don't mean to be controversial in any way but all this is important.  We're all working away with data while the landscape is constantly shifting.  Even though much of it can only be guesswork for me it is certainly worthwhile trying to anticipate how things will pan out and it's good to know what others think also.

Lindsay Bleakley

unread,
Feb 6, 2015, 9:34:16 AM2/6/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hi All,

I have spent some time reviewing the comments, but admittedly, there is quite a lot to read so I haven't gone through it word for word.

I'd like to add a few comments to the discussion.  Firstly and foremost, on behalf of National Rail Enquiries, we recognise and appreciate that in the past we have been much more closed and we weren't resourced to engage with developers, nor were we committed to providing 'open' access to the industry data that we held.  There has, however, been a significant organisational change in NRE, ATOC and the wider industry as a whole.  There is now much more focus on transparency and, as part of that, open access to information. 

Admittedly, we still have work to do.  This don't happen over night, but we are listening to the feedback and taking it on-board.  There is a great desire to work collaberatively with the developer community now, so please do be patient with us as.  At NRE we are only lightly resourced to handle the amount of questions and enquiries we are getting at present but we are endevouring to improve our service offering every day.

I'd like, briefly, to touch on one issue that has been raised.  The following licence agreement statement: "You must not use Darwin Information in any End User Product that displays different train arrival and/or departure predictions to those derived from Darwin" .

I understand that this specifically is presenting an issue for some of you.  I'd like to explain our stance and stress that we haven't written this clause so as to be difficult or to gain a monopoly over other data providers.  The UK rail industry has invested a signifcant amount of money to connect all of the in-station screens to Darwin so that all industry train movements feed directly into Darwin and then back out to screens, thus creating the consitency of information nation wide. 

There are some 66 disperate TOC CIS systems, none of which communicate with each other or with Network Rail.  Any changes made in any of those systems would only ripple out as far as the screens that were controlled by that specific system.  However, for the first time on March 31st 2015, all of the station screens in the UK will be fed directly from Darwin.  This means, that if Darwin predictions are used on any "End User Product", that product will show exactly the same information as you will see on the station screens.  That in itself will have a very powerful impact on customer confidence in terms of the data that they are viewing.   Should we remove this specific condition it would be entirely contrary to the national Darwin CIS project that we have been working towards for the past five years.  It's a really exciting project that, when delivered, will have a huge impact on the quality of the data.  This is the same data that we are committed to continuing to provide openly.

I would like to take this opportunity to invite you all to our offices for an informal meeting.  I'd like to present the Industry Customer Informaiton Strategy and allow you the opportunity to discuss it with us and raise any concerns that you have.

Is this something that would be of interest to you?

Would it be worthwhile for you?

Many Thanks
Lindsay Bleakley
National Rail Enquiries

Mike Flynn

unread,
Feb 6, 2015, 10:15:09 AM2/6/15
to openrail...@googlegroups.com, pe...@retep.org.uk
>> ..
I would like to take this opportunity to invite you all to our offices for an informal meeting.  I'd like to present the Industry Customer Informaiton Strategy and allow you the opportunity to discuss it with us and raise any concerns that you have.
..

I think this is great suggestion and offer. I've no doubt this would be interesting and informative and I for one would be pleased to attend such a meeting.

George Goldberg

unread,
Feb 6, 2015, 10:27:50 AM2/6/15
to openrail...@googlegroups.com
On 6 February 2015 at 15:15, Mike Flynn <mi...@a1publishing.com> wrote:
I think this is great suggestion and offer. I've no doubt this would be interesting and informative and I for one would be pleased to attend such a meeting.

I agree fully with what Mike has said above, and would also be pleased to attend.

--

George

Tom Cairns

unread,
Feb 6, 2015, 10:34:34 AM2/6/15
to Lindsay Bleakley, openrail...@googlegroups.com, pe...@retep.org.uk
I agree that a meeting would be a good idea but I do have one item to touch upon.

While the industry’s direction is admirable, that shouldn’t mean that NRE should actively (through its licence conditions) be acting to prevent alternatives from being made as this, in my eyes, means that the industry is taking an interesting stance. Given that there are no disagreements, as far as I have seen, that Darwin isn’t always correct – surely the only source of confidence is that the values are the same, not that they are correct? I think calling it a powerful impact is, to be honest, somewhat overestimating it.

I’ll reiterate my previous point that if you use the NROD data feeds solely, the missing links do not include Darwin forecasts – they include pieces of data that Darwin is fed by TOCs which can only influence said forecasts. It’s an important difference. 

Tom

--
Tom Cairns

--

Chris Bailiss

unread,
Feb 6, 2015, 10:56:40 AM2/6/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hello Lindsay

Thanks for getting involved in the discussion.

I would also be interested in such a meeting.  I can imagine many/most people involved in the discussion here have day jobs, so could I suggest the end of a working day might be a good idea, if this is possible.

Chris

Peter Hicks

unread,
Feb 6, 2015, 5:44:11 PM2/6/15
to Tom Cairns, Lindsay Bleakley, openrail...@googlegroups.com, pe...@retep.org.uk

On 6 Feb 2015, at 17:34, Tom Cairns <t...@swlines.co.uk> wrote:
While the industry’s direction is admirable, that shouldn’t mean that NRE should actively (through its licence conditions) be acting to prevent alternatives from being made as this, in my eyes, means that the industry is taking an interesting stance. Given that there are no disagreements, as far as I have seen, that Darwin isn’t always correct – surely the only source of confidence is that the values are the same, not that they are correct? I think calling it a powerful impact is, to be honest, somewhat overestimating it.

Nobody is denying Darwin isn’t 100% correct all the time - but why is it so important that predictions are sometimes wrong?  And furthermore, how often are your own predictions from RealTimeTrains wrong?  Are you prepared to put some work in to that to be transparent and open?

Consistency, as I’ve said before, is important so everyone’s presenting the same data.  As a non-tech savvy rail passenger, I don’t want different apps showing different times, because then it’s up to me to work out which is ‘best’.  I want to be able to look at a departure board at a station and see the same data that I saw on my mobile phone 30 minutes earlier when I left home.  The innovation developers can add here isn’t around the core science, it’s around innovative uses of the data.

Finally, with anyone being able to sign up to receive data from Darwin, it’s going to be really easy for anyone to analyse where Darwin’s predictions need work.  People can also validate that Darwin is reporting correct ‘actual’ times by looking at the TRUST and TD data that we have from Network Rail - this ticks off two transparency issues nicely.


Peter

signature.asc

Tom Cairns

unread,
Feb 6, 2015, 7:36:44 PM2/6/15
to Peter Hicks, Lindsay Bleakley, openrail...@googlegroups.com, pe...@retep.org.uk
I have a service that I like to think I can stand by most of the time, as I have mentioned before. In order to improve it, there are certain bits of data that I would like to use – and also certain parts I would not like to use. The predictions on Realtime Trains have issues due to this (and some other issues which I am already resolving) and I am continually working towards improving them, as I’m sure you do with different parts of Open Train Times.  When I have more time, I will consider putting some of it into releasing some reliability figures – perhaps an independent third party would like to come up with some ideas on how to test it suitably (it wouldn’t be appropriate for me to do so), but as I said before: it’s not a competition. 

Nevertheless, I think it’s highly disingenuous and perhaps complacent to suggest that developers cannot offer benefits to the 'core science’, as you put it. Innovation can be made in many areas, including that, and to say anything else is possibly going so far as to hold the industry back. There will be people in the future who come along and may be able to create even more changes of thinking through innovation and to do that, really, you shouldn’t place any restraint on them. I know that at my own alma mater there is plenty of research into the core science of rail delays and the like, and too on things like traffic management of which predictions are a core part as you’ve previously said, so I think there’s plenty to come in that area.

I don’t wish to be negative but at the end of the day, Peter, you are in the industry as part of your work and have more knowledge and access to certain parts of it than the rest of us and use this to your advantage. I’m personally in a position that I have a service that hundreds of thousands of people use on a monthly basis – using solely data available openly or that I’ve had to manually collate – and the financial sums of doing much else than this as I’ve stated before simply do not work. As far as I can tell, you have appointed yourself as our semi-direct liaison with Network Rail and have an apparent influence with NRE and I sometimes wonder, and I know I am not alone, if because of these factors you have lost sight of what some of us would and, more importantly, can and can’t do.

Tom

--
Tom Cairns

Chris Northwood

unread,
Feb 7, 2015, 7:55:27 AM2/7/15
to openrail...@googlegroups.com
My main issue with Darwin's LDBWS so far (and I've had a token for a few years) is that the API calls you can make are quite restrictive, they've been aimed for one purpose only, showing you a departure/arrival board in a particular form at a station. My work with the NROD feeds so far have just been because the data is available in a much more useful format (ie, you can construct your own data structures based on the events coming through, rather than being restricted to the data format the LDBWS gives you) so you can slice it up and do different things with it. I'm hoping Darwin's Push Port will solve this issue.

On the open source point I raised, one of the things I was hoping would emerge from the community is some sort of 'OpenDarwin' - an open source re-implementation of Darwin based on the NROD feeds - similar to the OneBusAway project in other open transport areas. Tom's RTT is probably the closest to this but is closed source (afaik?). The project I'm working on at the moment is constructed in such a way that I want to be able to generate departure boards, running details, push alerts etc for more than just rail - Manchester Metrolink is opening up their data slowly (and TfGM are committed to open data), Manchester's Metroshuttle has open GPS co-ordinates that are hard to do anything with directly but you could write something which emits departure/arrival events similar to NROD (and we already have schedule data) and see what it comes out with. I imagined an OpenDarwin would be a useful base (if architected in such a way that there were good layers of abstraction) for such a project. Similarly, if Darwin itself was open sourced, that could itself potentially provided a useful base (I know little about Darwin's architecture behind the scenes).

I'm a bit of a evangelist when it comes to open source, which is why I brought the point up :) Yes, most successful open source projects have large corporate backing, and I imagine a Darwin developed only wouldn't change much day-to-day, but there may be interested parties who could try different prediction methodologies within the Darwin engine. If there was some sort of prediction accuracy test suite as well, that would become very interesting (e.g., a PhD student who tries throwing new machine learning techniques at it...)

Dave Butland

unread,
Feb 9, 2015, 4:40:56 AM2/9/15
to openrail...@googlegroups.com
Lindsay, 

I would be very interested in hearing your strategy and have an opportunity to contribute. 

As a slight aside I think you have a long way to go with integrated data though. Your App and screens do not display the same information. I've noticed, especially at Reading, that the App appears to be using a pessimistic prediction whilst the on Station data appears to be optimistic. 

Dave.

Peter Hicks

unread,
Feb 9, 2015, 2:06:03 PM2/9/15
to Tom Cairns, openrail...@googlegroups.com
Tom,

On 7 Feb 2015, at 00:36, Tom Cairns <t...@swlines.co.uk> wrote:
I have a service that I like to think I can stand by most of the time, as I have mentioned before. In order to improve it, there are certain bits of data that I would like to use – and also certain parts I would not like to use. The predictions on Realtime Trains have issues due to this (and some other issues which I am already resolving) and I am continually working towards improving them, as I’m sure you do with different parts of Open Train Times.  When I have more time, I will consider putting some of it into releasing some reliability figures – perhaps an independent third party would like to come up with some ideas on how to test it suitably (it wouldn’t be appropriate for me to do so), but as I said before: it’s not a competition. 

I admire the fact you vehemently defend the website you’ve spent a lot of time and effort building.  I think you’re still glossing over the point of consistency versus accuracy, possibly because you’ve put so much work in to a very specific area and because you have a loyal user-base of, from what I can tell, mostly ‘advanced users’.  That said, now is not the time to go against the industry when it comes to predictions.  I realise you may be feeling a little downtrodden because Darwin has finally been opened up and you may lose one of your website’s unique selling points - but as I’ve pointed out to you before in private, this was a situation that I, and others, could see coming at some time.

When you say “it’s not a competition” - what are you referring to?

From the transparency angle, if you could release a week or a month’s worth of the RealTimeTrains ‘push port’ data in to the public domain, we can compare it the same period’s worth of data from Darwin and let the group come to its own conclusions on who is more accurate.  I think a hard metric on accuracy is important here.

It would also help if you could contribute to the discussion by outlining the data you want to use and where this data can be sourced from - we may well find other people who want the same data for other purposes, so more than one person benefits.  There’s a Google Form which I’ve posted a link to last week - for which the results are hidden at the moment until I can work out how much people are willing to discuss in a public forum.

Nevertheless, I think it’s highly disingenuous and perhaps complacent to suggest that developers cannot offer benefits to the 'core science’, as you put it. Innovation can be made in many areas, including that, and to say anything else is possibly going so far as to hold the industry back. There will be people in the future who come along and may be able to create even more changes of thinking through innovation and to do that, really, you shouldn’t place any restraint on them. I know that at my own alma mater there is plenty of research into the core science of rail delays and the like, and too on things like traffic management of which predictions are a core part as you’ve previously said, so I think there’s plenty to come in that area.

I’m not suggesting that developers can’t build upon the core science of predictions, rather I’m saying that the accuracy isn’t the focus right now.  It will be in the weeks and months to come, and the contribution you, and other highly talented people can make is by influencing and working on the algorithms and code that produces predictions.  However, please don’t lose sight of the bigger picture - research and development is important overall, but right now and in the short-term, the opening up and evolution of Darwin is, at least as I see it, even more important.

I can’t see any point in trying to reinvent the wheel and create a separate source of truth to Darwin - it will be wasted effort, and I think you’re better off highlighting where Darwin isn’t as accurate as RealTimeTrains so somebody, somewhere can work on a solution that’ll benefit everyone.  However I fear the elephant in the room is that you might want to sell your predictions service to NRE.  To do that, you need to embrace what the industry is doing and work with it, not against it.

I don’t wish to be negative but at the end of the day, Peter, you are in the industry as part of your work and have more knowledge and access to certain parts of it than the rest of us and use this to your advantage. I’m personally in a position that I have a service that hundreds of thousands of people use on a monthly basis – using solely data available openly or that I’ve had to manually collate – and the financial sums of doing much else than this as I’ve stated before simply do not work. As far as I can tell, you have appointed yourself as our semi-direct liaison with Network Rail and have an apparent influence with NRE and I sometimes wonder, and I know I am not alone, if because of these factors you have lost sight of what some of us would and, more importantly, can and can’t do.

I think you’re making three points here:

1. That I am more knowledgable than other people and take advantage of this
2. That I am, somehow, personally benefiting from feeding back information to Network Rail on what developers are doing, in order to keep them up-to-date with goings on and the good work that’s being done with their data
3. That I don’t know what the developer community wants

On the first point, I think I’ve consistently demonstrated, and without being asked to do so, that I use my knowledge to help others.  The number of posts I’ve replied to, the amount of content I’ve put on the Open Rail Data Wiki - I can’t see how spreading this knowledge around, where I can (because there are NDAs and confidentiality agreements), and helping the community is somehow advantageous to me.

On the second point, I am pre-empting Network Rail having to build up and maintain relationships with a number of individual developers, and aggregating requests for data, helping them steer their Transparency programme.  There is nothing stopping anyone else from getting in touch with Network Rail directly, as I’m sure they have, so I can’t see why my volunteering to help the community *and* to help Network Rail is a bad thing.

On the final point, that’s your personal view, which you’re entitled to - but one which I think is wholly incorrect.  You may think that, because I know the industry and can skip over the workings-out and get to an answer that is not what you want, that I’m somehow stopping more data being opened up.  If that’s the case, you’re wrong - there are limitations to what can be done now and, so it’s probably better to focus on the quicker wins whilst still keeping sight of the longer-term strategic objectives.

Finally, please stop attacking me because you see me as an ‘easy target’, and please start to understand the rail industry’s position and the influence you can have to make a positive difference.  You’ve put a lot of work in to your successes - you need to let other people be successful in their own ways too.  I’ve had emails of support, off-list, from other who can’t work out why you’re being negative both toward me and toward NRE and its objectives.

I hope I’ve clarified my position to you and maybe helped you to see that my intentions are quite a lot more alturistic than you believe they are.  I spend many hours a week helping others use the Network Rail data, and I’ll continue to do this with the Darwin Push Port data, because I have hands-on experience of that too, and want to help others work with it and be successful.



Peter

George Goldberg

unread,
Feb 9, 2015, 3:24:06 PM2/9/15
to Peter Hicks, openrail...@googlegroups.com
On 9 February 2015 at 18:52, Peter Hicks <peter...@poggs.co.uk> wrote:

<snip>
 
On the second point, I am pre-empting Network Rail having to build up and maintain relationships with a number of individual developers, and aggregating requests for data, helping them steer their Transparency programme.  There is nothing stopping anyone else from getting in touch with Network Rail directly, as I’m sure they have, so I can’t see why my volunteering to help the community *and* to help Network Rail is a bad thing.

Hi Peter,

As a relative newcomer to this community without any clear picture of who works on what (and for whom), I've been left a little bit confused by some of the discussions in this thread. I initially assumed you were representing National Rail Enquiries in this discussion, but I now realise that I probably jumped to an incorrect conclusion there. I think it would be hugely beneficial to myself and other relative newcomers like me if affiliations were made a bit clearer on this mailing list.

(In the spirit of that - I'm a PhD student working on modelling disruption on rail networks, primarily working with TfL rather than the NR or NRE, but I also have a casual interest in information provision on national rail, and doing things with open data in general).

--

George

Martin Swanson

unread,
Feb 9, 2015, 3:33:29 PM2/9/15
to Dave Butland, openrail...@googlegroups.com
Lindsay

>>> There is now much more focus on transparency and, as part of that, open access to information. 
Do you have a roadmap we can review of what closed data you have and a proposed roadmap for opening up? 

Regards,

Martin
--
You received this message because you are subscribed to a topic in the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openraildata-talk/A2TleaEaBkc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openraildata-t...@googlegroups.com.

Tom Cairns

unread,
Feb 9, 2015, 4:58:52 PM2/9/15
to Peter Hicks, openrail...@googlegroups.com
Well, first of all, if you have taken it personally, or anyone who read it thought it was intended to be so, then I apologise to you and them - this entire debate is quite political and the position at which one appears to sit in at times can lead to what seems to be incorrect conclusions. Secondly, I’d like to make it clear that I do think the changes that were made by NRE were actually very good for a lot of people but that, as everyone knows reading this, I have a number of issues with its licence.

To make it clear, the missing piece in the puzzle, as I see it, for the NROD feeds is that of short notice alterations, cancellations, etc. All of this data is entered by either a TOC, Network Rail or at their request. None of the missing pieces involve Darwin directly to any great degree other than Darwin Workstation being one of the available inputs into passenger facing systems. My understanding is that a lot of the funding to bring most of these together and then out to the CIS screens was primarily funded by central Government. I may be wrong but I cannot find any information contradicting it suggesting that it is indeed the case – indeed discussions I have had personally with others have also suggested as such. The DfT open data strategy [1] hints that they’d like things like this to be open and I’m sure I’ve seen, although I can’t locate it at the moment, a Cabinet Office document about projects that have used government funding and open data.

To use a food metaphor, I think of the NROD feeds as something of a “here’s the ingredients, make your own recipe” and Darwin as a “here it is already baked”. Unfortunately, the baked version has a secret sauce that is able to benefit only those using it and those using only ingredients simply can’t get the same outcome. I think it is in the interests of everyone who uses the NROD feed, and who wishes to be able to find completeness in it, that more continued work is done towards getting that missing piece publicly available on the same terms as that of the existing feeds. It seems somewhat contrary to logic that in order to complete it, one has to effectively try and filter the various ingredients that you want out of an already baked product. I’m certainly personally continuing to seek the path that we can get that secret sauce separately on reasonable conditions as others will be able to attest to.

The other issue is predictions. There are two facets to this as I personally see it - ‘one version of the truth’ and also the licence requirement of using them. These appear to be the same thing but I don’t personally see them as that. I’ve never denied that one version of the truth is an admirable aim and I do genuinely see the benefit for a vast majority of rail users but I don’t really see those who use NROD feeds to get the additional value out of them as targeting the typical rail user. There will always be those who seek an alternative no matter what moves the industry makes – but that doesn’t mean they will gain much, if any, traction. My thoughts about the licence requirement of using them I’ve detailed previously – I personally think there are issues, where there are developers may opt to wanting to show my own, there will be gaps in what Darwin provides vs what is needed, etc. 

As for the general ecosystem, I maintain that R&D is just as important as, and I could be persuaded to say more than, the release of data itself and we all should encourage everyone to make what they want to make when they want to make it. The early adopters will have a head start but some of them will be dissuaded by irrelevant restrictions – it’s not going to be possible to get everyone on the same hymn sheet and I don’t think that anyone should expect it. However, if one thing proves to be better than another, and we all see this in many industries, then it’s human nature to converge towards that without needing to be required to do so.  It, in a way, feels like that the barrier of entry to rail data, itself, has been significantly lowered (and that is a very good thing!) but that a barrier to innovation is still very much up there. 

All of what I’ve said above is my personal view from an altruistic stance: if Realtime Trains didn’t exist, and to be honest I’ve considered shuttering it a few times over the last few years, then I would still have that above view. Now, using Darwin data as an ongoing commercial concern is different and I base this from my experience with Realtime Trains. Depending on the product, usage of Darwin, and a service's/product's own consumers, reaching 5,000,000 ‘hits’ per railway period is not particularly difficult! I’ve just gone back to the costing sheet I did for RTT’s potential use of Darwin and looked back in history a little bit to see where the charge point would have been hit – this is based on both pull and push as ultimately the charge uses the same mechanism for RTTs purpose. 

The point at which I would have been charged, had I been using Darwin data on the new pricing basis from the get go, was September 2013 (11 months since RTT launch) – the point at which the cost outstripped revenue would have been the end of 2013 (14 months). Obviously we are now over 2 years down the line and the costs would now sit at £85,000 per year. To make it totally clear, the revenue I make from RTT pays for my rent, food and itself – it’s a good project to work on and I’m quite happy with how it is: I can’t believe that anyone would take a consideration that I would want to sell parts of it to NRE seriously as that would add restrictions to what I can do! :-)

A few comments which I’ve had off group suggested that much of these hits revolve from requests involving freight services and the like but, unfortunately, I actually excluded these from the numbers and if they were included then I’d be looking at closer to £100,000 per year – but that’s purely academic as Darwin doesn’t include that information. It strikes me that, as an independent developer, if you want to take a product to market and be successful then you need to have a very solid and guaranteed revenue source in place from the start (advertising is not either of these). Realtime rail information alone itself doesn’t particularly lend itself to a huge revenue source – added value services do – so the incentives to bring fantastic new products, if you continue down one particular path and improve it, quickly dissuade themselves. 

On to what I want to do sitting with both hats on – well, I want to be able to get that missing bit of data openly: I don’t think there’s any reason to give up trying to get it. I want to continue to innovate my own projects in the areas I want to work in and not have to wait until it becomes convenient for a source to drop restrictions on me doing it. Financially, this is the only option I can take but it’s the option I want to be able to take – the idea of solving interesting programming and mathematical challenges for myself really drives me. And I’m sure others are the same.

On a side note, I have worked with Darwin data on client projects. Other than what I’ve said previously, and what others have said regarding output formatting, etc., I think it’s pretty good for what its stated goals are. It’s very good for getting little personal projects and their ilk available and people should use it for those purposes. My concerns just rear their head when you become a bit more involved and further down the line both conceptually and in time. 

Tom


--
Tom Cairns

Paul Kelly

unread,
Feb 9, 2015, 7:45:28 PM2/9/15
to openrail...@googlegroups.com
I've been following this discussion with interest, but haven't found
time to contribute until now. Some intriguing points have been made by
various parties and deserve answers.

As someone suggested it wasn't clear where everyone was coming from and
what their "agendas" were, I'll try and briefly summarise for myself: I
run the sites brfares.com and brtimes.com. Nearly all the data behind
these sites I get from the ATOC open data site at
<http://data.atoc.org/>; brtimes.com also mashes in daily schedule
updates from Network Rail data. I don't use any of the STOMP-based data
feeds in the sites, although it's a vague future aim to include the VSTP
and Train Movement data in brtimes.com.

The National Rail Enquiries data feeds don't really appeal to me; I see
them as more of an "open API" than "open data" really. From
data.atoc.org I can download full, raw, complete schedule and fares data
files and do unlimited analysis on them on my own server without having
to make any API calls or place any further load on ATOC servers.

In fact I feel that the ability to download data for off-line analysis
and to use it in ways that have not been envisaged by the provider of
the data is really central to the whole ethos of open data. However I do
like the sound of the NRE "push port" and look forward to learning what
that entails.

But I would be very concerned if the provision of "raw" data on
data.atoc.org turned out to be endangered as a consequence of the
enhanced NRE data feeds---not least because my sites rely on it! I would
certainly be interested in attending a meeting at National Rail
Enquiries to learn more about the thinking behind and strategy for
development of the various APIs and it would be great if there was
someone there who could talk about the situation with data.atoc.org too.

I agree with those who say the additional inputs Darwin has over Network
Rail's systems should be made open independently of Darwin. Perhaps if
Network Rail had access to these inputs, it could make them available
through the NROD platform? As I see it, it would provide a level playing
field for other providers of real-time information and predictions and
allow for more accurate comparison of prediction algorithms.

I strongly disagree with the sentiment that anyone who had the
capabilities to work with the data and develop improved prediction
algorithms should be working for the companies that develop Darwin. In
fact I think that's just daft - there are many, many skilled software
developers with an interest in and knowledge of railways who would be
capable of working with this data but who are happily employed
elsewhere. Many of them are on this list!

I don't see any need to push for Darwin to be open source. Even with the
new wind that seems to be blowing through NRE, the time is not yet ripe
for that. There has been quite a bit of talk about Darwin's forecasting
system. How does it actually work? Certainly I have seen examples of its
predictions that don't look like much more than assuming a late-running
train will catch up by the sum of pathing, engineering and performance
allowances en-route---but maybe there is nothing more to go on in a lot
of situations. In my experience its real value comes when there are
short notice platform changes at large terminus stations, but as others
have said that's really just down to the extra sources of information it
has access to. It could easily be argued that discrepancies such as the
example mentioned (of a unit swap at Paddington not having been notified
to the signaller) makes a mockery of the idea of a single source of truth!

The other thing I wanted to say is that I just don't get the
"consistency is better than accuracy" argument. I really don't. Perhaps
from an NRE point of view (i.e. one thing fewer for passengers to
complain about), but not from an open data point of view - why is it OK
if Darwin gives wrong information to passengers, but not OK if a third
party site does? If passengers find that they can't trust a particular
site or app then they don't have to use it! I feel that stifling
competition stifles innovation - the mind boggles at all the interesting
and exciting ways delay predictions could be calculated.

I like the idea of switching between pure Darwin forecasts and forecasts
by a different algorithm. They could even be shown side by side in an
app - that would be very interesting indeed. If everyone has to use the
exact same data, predictions etc. then I predict that the interest of a
lot of the more serious developers will be lost - where's the fun in it
if all you'd really be doing is re-skinning NRE? I really hope the move
towards the NRE APIs doesn't result in a loss of innovation.

Best regards

Paul

Paul Kelly

unread,
Feb 10, 2015, 5:09:21 AM2/10/15
to openrail...@googlegroups.com
Just been reading over my mail from last night and I'd like to elaborate
on this point:

On 10/02/15 00:45, Paul Kelly wrote:
>
> The other thing I wanted to say is that I just don't get the
> "consistency is better than accuracy" argument. I really don't. Perhaps
> from an NRE point of view (i.e. one thing fewer for passengers to
> complain about), but not from an open data point of view - why is it OK
> if Darwin gives wrong information to passengers, but not OK if a third
> party site does? If passengers find that they can't trust a particular
> site or app then they don't have to use it! I feel that stifling
> competition stifles innovation - the mind boggles at all the interesting
> and exciting ways delay predictions could be calculated.

The more I think about this the more I can both:
a) see things from NRE's point of view, and
b) see that the same logic does not apply to the open data ecosystem.

When Darwin supplies passenger information systems, train describers
etc. at stations it has a captive audience. These systems are seen as a
inherent part of the railway and certain standards are expected of them.
If station systems are inconsistent with TOC websites or NRE then I
understand that it can be embarrassing for the TOCs, who are NRE's
ultimate owners. I can really see why consistency is important here.

Websites and apps using open data feeds do NOT have a captive audience.
Those that people find useful will become popular, while others will
wither and die. That's how the open source ecosystem works. It is my
opinion that restricting innovation in this ecosystem through
non-standard licensing conditions will seriously disrupt the normal
processes, and make it less likely that the quality and diversity seen
in other environments will fully develop.

Perhaps NRE has already factored this in and decided that from their
point of view it is a compromise worth making?

Paul

Lindsay Bleakley

unread,
Feb 10, 2015, 5:20:27 AM2/10/15
to openrail...@googlegroups.com, dgbu...@gmail.com, lindsay....@atoc.org
Hi Martin,

The information about our available data feeds can be found at the following address http://www.nationalrail.co.uk/46391.aspx

The primary data feed that we have which was not open in the past is the Darwin Push Port.  We announced that we had opened it in June 2014, but were struck by the demand, so had to delay it's release until 31st March 2015 to allow us to build an open access syndication portal.

In reply to all, I have read a number of responces from people expressing interest in the meeting, so will look to set up a 2 hour meeting towards the end of the day in the coming weeks.  I will post it on the Google Groups and on the NRE Linkedin Forum (https://www.linkedin.com/groups?home=&gid=4831364). 

In reply to Dave.  There is an aggressive roll out of the Darwin CIS programme happening currently.  Darwin is not yet connected to the Reading CIS.  When it is connected, Darwin will consume all the data, including the predictions, platform numbers, special notices, train formations etc, in the Reading CIS.  It will then use a set of logic to determine which prediction is most accurate and that infromation will the be displayed concurrently on the screens and across all Darwin outputs. 

Also, In reply to Tom.  There is most certainly value in the NROD feeds, and in products / services that are built off those feeds.  However, the industry vision is that Darwin provides all TOC systems with customer facing train running information.  There has never been one consistent picture of train running across the industry, simply due to the number of disperate systems there are.  Once all the systems are connected for the first time, we will have that consistency.  Once that consistency exists, then accuracy will improve because we will have all the TOC staff in the country looking at the same information, ready to tell us when they notice something wrong.  Prior to that, a TOC would go to the CIS provider to make local changes in their own CIS system to fix issues, which then wouldn't ripple down to the rest of the systems.  A central system allows one change to be made at source.  This is what the industry believe is the right thing to do.

The other point you mention is that there has never been a statement from NRE saying that Darwin is always right.  That is because Darwin can't always be right, nor can other versions of Darwin.  By the very nature of the word "prediction" you are making a "best possible estimate".  For example, in a case where a train is stuck because of a flood and the driver doesn't know how long the train will be there, then certainly Darwin can't be expected to know.  All anyone can do is to provide estimates based on the information that is available and the information learned from past movements.

Anyhow, I look forward to meeting those of you who can make it in a few weeks.  I'll announce further details when I find a suitable time and date.

Many Thanks
Lindsay
To unsubscribe from this group and all its topics, send an email to openraildata-talk+unsubscribe@googlegroups.com.
To post to this group, send email to openraildata-talk@googlegroups.com.

Peter Hicks

unread,
Feb 10, 2015, 5:28:31 AM2/10/15
to Paul Kelly, openrail...@googlegroups.com
Hi Paul

On 10 Feb 2015, at 10:09, Paul Kelly <pa...@pdkelly.de> wrote:

> When Darwin supplies passenger information systems, train describers etc. at stations it has a captive audience. These systems are seen as a inherent part of the railway and certain standards are expected of them. If station systems are inconsistent with TOC websites or NRE then I understand that it can be embarrassing for the TOCs, who are NRE's ultimate owners. I can really see why consistency is important here.
>
> Websites and apps using open data feeds do NOT have a captive audience. Those that people find useful will become popular, while others will wither and die. That's how the open source ecosystem works. It is my opinion that restricting innovation in this ecosystem through non-standard licensing conditions will seriously disrupt the normal processes, and make it less likely that the quality and diversity seen in other environments will fully develop.


Equally, remember that this data release is a big step for the industry and there is potentially a lot at stake, particularly to do with quality and reputation. There are enough sticks out there to beat the railway with anyway, and plenty of people willing to do it.

There’s a gap between “what the industry is comfortable doing now” and “what some people in Open Data circles want”, and that gap is too large to close right now. Small steps are, I believe, the way forward - it allows both sides to engage and understand what’s important to everyone.

So, whilst we may have a release of data looming that’s not quite suitable for what a few people want to use it for (and we haven’t heard from the silent majority who will be fine with what’s being released), that doesn’t devalue the entire data release, nor mean anyone’s ‘bad’ for releasing data like this. It just means we need to consume and feed back our findings.


Peter


signature.asc

Mike Flynn

unread,
Feb 10, 2015, 5:28:46 AM2/10/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Following on from what Paul has just said.  Could you imaging 'competing' departure and arrival boards in a station concourse?  Of course not, there'd be chaos!  But whilst the same is not true of websites and apps I think it's going to be difficult to 'compete' for those displaying 'unofficial' timings.

It is also a highly political debate, there's just no avoiding the fact.  For this reason I personally am taking a pragmatic view and happy and grateful to go with the flow.  Whilst we, as open developers, may be able to shape the debate, any decision making will ultimately be made from above.

I'd also just like to add that open data does is not the same thing as free data.  To continue a previous metaphor, there's no such thing as a free meal.

Tom Cairns

unread,
Feb 10, 2015, 8:12:58 AM2/10/15
to Mike Flynn, openrail...@googlegroups.com, pe...@retep.org.uk
On 10/02/2015 10:28, "Mike Flynn" <mi...@a1publishing.com> wrote:
I'd also just like to add that open data does is not the same thing as free data.  To continue a previous metaphor, there's no such thing as a free meal.

I just want to add that open data equally does not let you charge on a wholly commercial basis, the charge of accessing it should be at cost (e.g. the cost of delivering the ingredients, and baking if really necessary) or less. The financial value of open data is not particularly enumerable to the organisation that releases it, which is why NRE being a private organisation albeit with some effective public service requirements makes this debate politically fraught. However, the overall potential economic value to the railway and the country can be recognised and noticed, at least in the latter’s case…we wouldn’t have organisations such as the ODI otherwise.

--
Tom Cairns


Mike Flynn

unread,
Feb 10, 2015, 9:15:30 AM2/10/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Tom, I totally sympathise with you.  I think there's little doubt that your systems, at least as regards timings, predictions, etc., are more advanced than any other of us developers, and possibly even, though I actually don't know, more accurate than Darwin, and there is a danger that this work could go to waste, as it were.  But you do have options.  Your advanced systems will continue to find a market place outwith general public passenger information.  And there's nothing stopping you, and I'm sure you will, alongside your existing systems, embrace also what is and will become available by way of NRE and Darwin.  

I certainly think you should be given the opportunity to show your times side by side with Darwin.  

But I also think your view on the charging argument is wrong.  The current UK railway set up, whether trains, tracks or information, can in no way be described as any kind of a normal market place.  And I don't think it much matters whether the proposed charges are to cover costs, to make a profit or just to deter time-wasters.  More important, does the structure allow for a good profit to be made, to stimulate innovation?  My personal opinion, and from experience, these proposed charges from NRE are more than reasonable.  If they weren't I would shift my focus more towards Network Rail data.  It's a business choice. 

So the times across the industry will be uniform, nearly all agree a good thing, but their delivery can be in all manner of flavours.   

Tom Cairns

unread,
Feb 10, 2015, 10:25:27 AM2/10/15
to Mike Flynn, openrail...@googlegroups.com, pe...@retep.org.uk
On 10/02/2015 14:15, "Mike Flynn" <mi...@a1publishing.com> wrote:

But I also think your view on the charging argument is wrong.  The current UK railway set up, whether trains, tracks or information, can in no way be described as any kind of a normal market place.  And I don't think it much matters whether the proposed charges are to cover costs, to make a profit or just to deter time-wasters.  More important, does the structure allow for a good profit to be made, to stimulate innovation?  My personal opinion, and from experience, these proposed charges from NRE are more than reasonable.  If they weren't I would shift my focus more towards Network Rail data.  It's a business choice. 

My problems with the charging model are not for my own benefit – I reckon I could find a way of making it work and building the financial basis up over a period of 6-12 months or so, but that detracts from my personal want of working on it: making money is not my modus operandi.  As I commented before, I think there is certainly a cap to be hit on revenue generating out of sites that are purely realtime related unless you go down the road of added value services. There is certainly the potential of using commission and referral links to generate revenue further but I feel that there is a distinct point at which one becomes less impartial compared to what I think we should be when doing this kind of thing. After all, we should all be good citizens when working with this data and each individual’s view as to what that entails will be different.

As for how the railway is set up… I think that it does blur the lines somewhat but the industry needs to think differently about its situation. The stated aim of the high volume charges in the Darwin licensing regime is so that they can support its ongoing running costs. Now, if you ask me, given that it will soon become an integral part of all passenger information which NRE can influence, third parties should not have to incur a cost to receive it at more than the ‘at cost’ of delivering it. I would have thought, given that role, it should be paid for by central Government (although perhaps just guaranteed), NRE (through its existing revenue sources of advertising, etc.) and the train operators – it’s a central part of their passenger information stack, they should have a budget for it and not need to use third parties.

I’ll be honest, I thought your revenue per page view was incredibly high. You can probably calculate mine, I can’t disclose it exactly obviously due to the contractual rules around Adsense and similar programmes, but it is extremely significantly lower than yours. It's been dropping considerably for a long time while page views and overall usage continues to increase…hence my comment about an effective cap in growth of revenue. Nevertheless, I don’t think that profit stimulates the majority of innovation – new ideas, designs, concepts do it far more and if bars are in place to deter real out of the box thinking then the only innovation we’ll get to see is just minor changes and perhaps rethinks on the traditional live departure concept and perhaps the automated delay compensation concepts. There’s far greater potential to be had.

Your advanced systems will continue to find a market place outwith general public passenger information. 

They probably will, you’re right. Two thirds of users on the website use the detailed mode but RTT’s API (which operates only in simple mode at present) takes a lot of hits and, overall between the two, 90-95% of all queries now are wholly for passenger information. Even if I ignore this, I don’t feel comfortable with the fact that I know it can be wrong in disruption because the most important data is missing. Other people in this position are the same. I think that completeness is extremely important and, while I’m relatively confident it’ll come in time, we shouldn’t sit on our laurels and do nothing until it does. 

I genuinely like the fact that people can now use Darwin but for those of us who can’t or don’t want to for whatever reason we need to continue working towards that level playing field of completeness. And by that, I mean all manual inputs and the raw automated data feeds necessary to generate the rest of it. I don’t believe there should be a question of whether people will be forced to choose between completeness with restrictions or raw data which isn’t complete, which right now and for the foreseeable future there will be.

Tom

--
Tom Cairns

Mike Flynn

unread,
Feb 10, 2015, 10:57:15 AM2/10/15
to openrail...@googlegroups.com, pe...@retep.org.uk
>> I reckon I could find a way of making it work and building the financial basis up over a period of 6-12

Yes, you could.  Overnight even.  And shifting to Darwin, even over time, you would keep the vast majority of, and build upon, your current traffic.  But your focus would of course need to shift more to delivery, format and design from that of timing content though I know this puts you at an initial disadvantage.

>> making money is not my modus operandi

But you don't wan too!  You might want to shift your principles a bit too.  Hey, I don't want pages stuffed full of adverts anymore than the next man.  But try to think of ads as a sort of added-value and it's not quite so bad.  And there's as much maths involved in creating a working business model as there is predicting train times.  But hey each to their own.

As regards NRE costs and charges I'd say they are considering far more the potential impact to revenue on their sites as they are to what they might bring in by way of data provision.  But to that I'd point to where the really big money is which is whether better passenger information will to lead higher passenger numbers.  And with this in mind, I've no doubt the TOC's, by way of the government, will be the ones re-evaluating the charging structure over time.  They might even come to the conclusion the data should be completely free.

Peter Hicks

unread,
Feb 10, 2015, 11:03:36 AM2/10/15
to Tom Cairns, Mike Flynn, openrail...@googlegroups.com, pe...@retep.org.uk

On 10 Feb 2015, at 15:25, Tom Cairns <t...@swlines.co.uk> wrote:

I genuinely like the fact that people can now use Darwin but for those of us who can’t or don’t want to for whatever reason we need to continue working towards that level playing field of completeness. And by that, I mean all manual inputs and the raw automated data feeds necessary to generate the rest of it. I don’t believe there should be a question of whether people will be forced to choose between completeness with restrictions or raw data which isn’t complete, which right now and for the foreseeable future there will be.

Here’s an interesting take on it - if you look at NRE as a supplier of ‘shared services’ to an industry (a bit like Tyrell is), would you not need the buy-in of the TOCs supplying the manual input to be able to distribute it?

If I stepped in to the shoes of a TOC, I might be comfortable with the trustworthiness of Darwin and therefore happy to supply data to it via Darwin Workstation - but uncomfortable about making the same inputs available to others who may predict different times to the source I consider official and want my customers to see.

Do we think the DfT need to put Open Data clauses in future franchise agreements to ensure operators make a minimum level of data ‘open’?

Out of interest, is it only the charges for using Push Port data that stop you from using Darwin’s forecasts?  For example, if it were financially feasible (zero-cost or a modest profit) for you to use the forecasts, would you definitely use them?


Peter

signature.asc

George Goldberg

unread,
Feb 10, 2015, 11:18:25 AM2/10/15
to openrail...@googlegroups.com
On 10 February 2015 at 16:03, Peter Hicks <peter...@poggs.co.uk> wrote:

I'm going to chip in with my answers to these questions, as I assume the more feedback there is from this group on the direction we'd like to see things going, the better.

Here’s an interesting take on it - if you look at NRE as a supplier of ‘shared services’ to an industry (a bit like Tyrell is), would you not need the buy-in of the TOCs supplying the manual input to be able to distribute it?

If I stepped in to the shoes of a TOC, I might be comfortable with the trustworthiness of Darwin and therefore happy to supply data to it via Darwin Workstation - but uncomfortable about making the same inputs available to others who may predict different times to the source I consider official and want my customers to see.

This is a fair point from the point of view of the TOCs, but on this basis it is inaccurate to claim that Darwin is Open Data, when in reality it is a commercial service with costs and restrictions attached to it, that just so happens to have a "free tier" in it's pricing structure.
 
Do we think the DfT need to put Open Data clauses in future franchise agreements to ensure operators make a minimum level of data ‘open’?

Personally, very much yes! But it would be even better if we could convince them of the benefits that could come from encouraging innovation by doing this sooner, and without having to be compelled by the government.
 
Out of interest, is it only the charges for using Push Port data that stop you from using Darwin’s forecasts?  For example, if it were financially feasible (zero-cost or a modest profit) for you to use the forecasts, would you definitely use them?

For me, that's a big part of it. However, I consider the requirement to use the Darwin forecasts exclusively a major issue too, for reasons that have been restated enough times by different people already in this thread, but essentially because I find it frustrating to have the opportunities for innovation with the data made available be arbitrarily limited.

--

George

Tom Cairns

unread,
Feb 10, 2015, 11:35:28 AM2/10/15
to Peter Hicks, openrail...@googlegroups.com
On 10/02/2015 16:03, "Peter Hicks" <peter...@poggs.co.uk> wrote:
Here’s an interesting take on it - if you look at NRE as a supplier of ‘shared services’ to an industry (a bit like Tyrell is), would you not need the buy-in of the TOCs supplying the manual input to be able to distribute it?

If I stepped in to the shoes of a TOC, I might be comfortable with the trustworthiness of Darwin and therefore happy to supply data to it via Darwin Workstation - but uncomfortable about making the same inputs available to others who may predict different times to the source I consider official and want my customers to see.

You certainly would need the 'buy-in' of TOCs, but I think that most would be willing to share that information independently of Darwin in principle as it means that the correct operational information is disseminated more widely. The concern about availability of inputs to others implies that there is still a concern about users doing the wrong thing with the data provided – there needs to be a change in thinking on this. If I recall correctly, the same concern existed with the release of timetables, realtime feeds, fares data and the world hasn’t really blown up on any of them. 

Users of open data, at least in the short term, are not going to be able, or aim, to supplant the existing major supplier of this data. Indeed, the likelihood in the medium and long term is the same. Most of the travelling public are not going to know, or care, about different data sources as I said before… but there are always going to be those who seek alternatives. I think, because of that, the apparent huge worry about forecasted times being different is way OTT – after all, if we all end up being just about right then everything converges on the same output values!!! Anyone who cares about their products and services will do their utmost to ensure their accuracy yet the worry seems to be that they will not.

Do we think the DfT need to put Open Data clauses in future franchise agreements to ensure operators make a minimum level of data ‘open’?

Yes - pretty sure we’ve had this discussion before between at least the two of us a couple of years ago.

Out of interest, is it only the charges for using Push Port data that stop you from using Darwin’s forecasts?  For example, if it were financially feasible (zero-cost or a modest profit) for you to use the forecasts, would you definitely use them?

The charges are only a secondary consideration. There are two sides to the hat on the question of using Darwin forecasts seeing as the question is aimed at me – from a personal and financial angle. Darwin won’t be able to fill all the gaps in for the predictions, so I’d still have to run my own ‘fill the prediction gap’ service for those if it’s permitted under licence… and for freight trains I’d still need my own prediction system. From a business and financial angle, it might make sense to so I may begin to consider it. From a personal angle – and at the end of the day I still see RTT as a personal project, I believe standing up for what I believe in is more important. 

Now, for others who are in my position (although I suspect I am the only one or one of very few), it might work differently. So in that situation, dropping the costs may make it viable in both senses. But overall, dropping the costs substantially would be a good step for many, but it still isn’t what the aim should be – it almost makes the NROD feeds a second class citizen in my eyes yet they have a far greater overall potential.

Tom

Where-in-Sussex

unread,
Feb 10, 2015, 4:21:38 PM2/10/15
to openrail...@googlegroups.com, pe...@retep.org.uk
One of the problems I see with being forced to use Darwin's predictions, is when  your own server has identified the prediction is wrong.

For example (and resignalling will change this particular one this weekend!), if an up train on the Bexhill branch is late, when Bexhill signal box is closed out, the signal section is effectively about 7 miles long.  The trouble is, CIS and Darwin aren't aware of when the closing switch is pulled, and believe there are five signal sections where there is actually now one.  If the up Ashford-Brighton is late, then the Ore-Brighton service shows as "on time" and then "1 late", "2 late" etc until the train passes Pevensey Signal box.

If you've set up your algorithm to identify boxes that close and other local anomalies (or even a stopper in front of a fast, which I've seen predicted to overtake on plain line), then you KNOW that the Darwin prediction is WRONG.

So, as the terms are at the moment, you're being forced to tell passengers information that you are 99% certain is incorrect by maybe 10 minutes!

Peter Hicks

unread,
Feb 11, 2015, 12:18:23 PM2/11/15
to Tom Cairns, openrail...@googlegroups.com

On 10 Feb 2015, at 16:35, Tom Cairns <t...@swlines.co.uk> wrote:

> You certainly would need the 'buy-in' of TOCs, but I think that most would be willing to share that information independently of Darwin in principle as it means that the correct operational information is disseminated more widely. The concern about availability of inputs to others implies that there is still a concern about users doing the wrong thing with the data provided – there needs to be a change in thinking on this. If I recall correctly, the same concern existed with the release of timetables, realtime feeds, fares data and the world hasn’t really blown up on any of them.

I think the concern is more around presenting different to what’s everywhere else, rather than it being a worry about doing something morally or ethically wrong.

I also think it’s vitally important to listen, engage and discuss the industry’s information strategy with the right people. Merely saying that there was concern over timetables, fares data etc. ignores what the concerns were are the time. Opinions and views change - but not without proper discussion and investigation, and I suspect not by trying to gloss over a discussion that needs to take place.

Let’s engage with NRE, listen to their concerns, understand the industry’s perspective and work on a mutually acceptable way forward. There seems to be an unspoken expectation that opening up Darwin should allow competition, but I don’t think it has to be like that… I’d certainly love to take Darwin’s predictions for OpenTrainTimes, because somebody else has done the heavy work for me, and I can innovate in broader, more useful ways than ‘accuracy’.

> Users of open data, at least in the short term, are not going to be able, or aim, to supplant the existing major supplier of this data. Indeed, the likelihood in the medium and long term is the same. Most of the travelling public are not going to know, or care, about different data sources as I said before… but there are always going to be those who seek alternatives. I think, because of that, the apparent huge worry about forecasted times being different is way OTT – after all, if we all end up being just about right then everything converges on the same output values!!! Anyone who cares about their products and services will do their utmost to ensure their accuracy yet the worry seems to be that they will not.\

You’re right - passengers won’t care about different data sources, but they, or TOCs, will care about different information being given out on different channels. Again, it’s not about accuracy right now - it’s about consistency. They are still two very different things - consistency coming first, then better accuracy later.

I am sure it hasn’t escaped the industry’s attention that Darwin isn’t always right…!

> The charges are only a secondary consideration. There are two sides to the hat on the question of using Darwin forecasts seeing as the question is aimed at me – from a personal and financial angle. Darwin won’t be able to fill all the gaps in for the predictions, so I’d still have to run my own ‘fill the prediction gap’ service for those if it’s permitted under licence… and for freight trains I’d still need my own prediction system. From a business and financial angle, it might make sense to so I may begin to consider it. From a personal angle – and at the end of the day I still see RTT as a personal project, I believe standing up for what I believe in is more important.

Again, you’re right to stand up for what you see as important - but also think of the bigger picture here, and think about people who aren’t as clever as you - developers and people who just want to know when their train’s going to turn up. Those people outnumber everyone else - how do you explain to them that your site will show “better” forecasts than other sites in certain cases?

I think you still need to release a period’s worth of historical prediction data from your site so we can independently compare the predictions from both RealTimeTrains and Darwin.

> Now, for others who are in my position (although I suspect I am the only one or one of very few), it might work differently. So in that situation, dropping the costs may make it viable in both senses. But overall, dropping the costs substantially would be a good step for many, but it still isn’t what the aim should be – it almost makes the NROD feeds a second class citizen in my eyes yet they have a far greater overall potential.

So you’re saying the charges are a secondary consideration, but not the key issue here?

How does it make the feeds from NR a second class citizen?


Peter

signature.asc

Peter Hicks

unread,
Feb 11, 2015, 12:34:22 PM2/11/15
to George Goldberg, openrail...@googlegroups.com
Hi George

On 9 Feb 2015, at 20:24, George Goldberg <geo...@grundleborg.com> wrote:

> As a relative newcomer to this community without any clear picture of who works on what (and for whom), I've been left a little bit confused by some of the discussions in this thread. I initially assumed you were representing National Rail Enquiries in this discussion, but I now realise that I probably jumped to an incorrect conclusion there. I think it would be hugely beneficial to myself and other relative newcomers like me if affiliations were made a bit clearer on this mailing list.

Sure thing.

I work for Rockshore, who run the Network Rail ‘Data Feeds’ platform. I’ve been working with the real-time feeds from before this existed, and I was one of the people who worked to get them generally available. I’ve been working with the real-time feeds for nearly four years now, and CIF data for even longer than that.

I’ve been involved in a number of projects at NR, and I’m regularly in meetings with people from all over the business, so I get a good view of what data’s there and not yet released, and I can get people talking to get data opened up.

Rockshore are also building the system that’s going to distribute the Darwin Push Port data, and I have a hand in that - albeit more from a customer proxy (developers being the ultimate customer) perspective.

In my spare time, I run OpenTrainTimes, which is proving quite popular, but not as much as other sites.

All my posts here are from my personal email address, as they're my views - those might not be the same views as others.


Peter

signature.asc

Craig Parker

unread,
Feb 12, 2015, 2:27:41 PM2/12/15
to openrail...@googlegroups.com, martind...@gmail.com, peter...@poggs.co.uk, cbai...@gmail.com, pe...@retep.org.uk
regarding "lack of access to verify availability of advance tickets" are there any API's out there that would give you that data?

I am still trying to decipher the ATOC fares data but I guess it doesnt have any availability logic in it at all? Is this the case even if you paid for the daily fares dump that they have?


Thanks,
Craig.

On Wednesday, 4 February 2015 11:16:09 UTC, Tom Cairns wrote:
It’s probably worth pointing out that ticket prices are actually out in
the open via RSP (http://data.atoc.org) albeit only updated per fares
manual release I believe. The missing bit on that one is the lack of
access to NRS to verify the availability of advance tickets.

Tom
 
--
Tom Cairns





On 04/02/2015 11:13, "Martin Swanson" <martind...@gmail.com> wrote:

>My reservation about Darwin is it is controlled by the ATOC. It is
>therefore influenced by public companies. I notice there is still a lot
>of useful data that is still kept private - routing guide, delay
>attribution, ticket pricing etc. I'm not sure you can really call this
>open data - it is data taken from Network Rail, processed by a body
>controlled by public companies, and then licensed under terms that don't
>seem wholly attractive
>

Paul Kelly

unread,
Feb 12, 2015, 5:43:56 PM2/12/15
to openraildata-talk
Craig Parker wrote:
> regarding "lack of access to verify availability of advance tickets" are
> there any API's out there that would give you that data?

No. Although historic patterns of availability on given trains could
theoretically be crowd-sourced to a certain extent, I suppose, if there
was a website where people could input details of the fare they had to
pay on a certain train. Probably not worth the effort, though.

> I am still trying to decipher the ATOC fares data but I guess it doesnt
> have any availability logic in it at all? Is this the case even if you
> paid for the daily fares dump that they have?

Depends on what you mean by availability logic. Stuff like booking
deadlines, allowed connecting TOCs for "TOC & Connections" advance fares
is there - this varies by ticket type and is shown underneath each
advance fare in brfares.com "simple mode".

But as far as I know the daily feed is identical in content/structure to
the thrice-yearly feed. There is a real-time element to advance fare
availability that can't be captured in static data, and you need access
to the reservation service to check that.

Paul

Craig Parker

unread,
Feb 13, 2015, 9:10:57 AM2/13/15
to openrail...@googlegroups.com
Hi Paul,

thanks for this info.

Regarding the "reservation service" you mention is this some kind of API that ATOC Accredited Ticket Issuers would get access to?

What about NRE? Do you know if they have any feeds (paid for or otherwise) that would give you availability?



Thanks,
Craig.

Mike Flynn

unread,
Mar 9, 2015, 6:24:37 AM3/9/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Hi Lindsay,

Is this the right medium?  I'm sending emails to Anita and you but with no reply for a few days again now. 

Anyway, I'm waiting 6 weeks now for my application (refs 621 and 622).  Any news?

Mike :)

Mike Flynn

unread,
Mar 9, 2015, 7:03:10 AM3/9/15
to openrail...@googlegroups.com, pe...@retep.org.uk
Thanks, I have reply now :)
Reply all
Reply to author
Forward
0 new messages