TD Anonymiser bug

188 views
Skip to first unread message

Chris Northwood

unread,
Oct 28, 2015, 8:29:29 AM10/28/15
to openraildata-talk
Here's an interesting one I've just spotted: this service http://www.realtimetrains.co.uk/train/Y58099/2015/10/28/advanced has been anonymised to "152G" on the TD feed, I'm not sure if there's any reason it should be (it's an in-service passenger train that my friend is currently on)...

Chris

Juhani Pirttilahti

unread,
Oct 30, 2015, 8:00:25 AM10/30/15
to A gathering place for the Open Rail Data community
Hi,

There might be another 1E06 running, or cancelled, by freight operating company. In some occassions, there isn't. Sometimes TD feed just sticks to an obfuscated headcode for no apparent reason while the Train Movement feed is still running with the "real" headcode.

Personally, I'm all against this whole obfuscation thing because it's just scrambling the data and confusing us without any apparent benefit to companies. There are now multiple open sources for train service codes and service groups which do connect individual services to companies. Therefore it isn't the headcode that should be obfuscated. This is just my opinion though.

Peter Hicks

unread,
Oct 30, 2015, 9:07:04 AM10/30/15
to Juhani Pirttilahti, A gathering place for the Open Rail Data community

On 30 Oct 2015, at 12:00, Juhani Pirttilahti <juhani.pi...@gmail.com> wrote:

> There might be another 1E06 running, or cancelled, by freight operating company. In some occassions, there isn't. Sometimes TD feed just sticks to an obfuscated headcode for no apparent reason while the Train Movement feed is still running with the "real" head code.

Can you share where the TD feed is using and obfuscated head code but the train movement feed isn’t? I can have a look in to why this is.

> Personally, I'm all against this whole obfuscation thing because it's just scrambling the data and confusing us without any apparent benefit to companies. There are now multiple open sources for train service codes and service groups which do connect individual services to companies. Therefore it isn't the headcode that should be obfuscated. This is just my opinion though.


The freight operating companies (except GBRf who are happy for their data to be open) are nervous about having data on their trains made wholly public. For this reason, when a train operated by a freight operator is ‘activated’ on TRUST, its reporting number is scrambled to keep the class the same, but the other three characters different. Similarly, with the CIF and JSON data, the service code is removed, as is the train category and reporting number.

The freight industry is happy with this - unlike passenger operators, freight doesn’t fall under a ‘public interest’ test because it’s purely commercial. You could argue that freight movements interact with passenger train movements, but the data out there is - I think - enough to work with.

To get this changed, we need to come up with a compelling argument to do so. Here are some examples of arguments for, and counter-arguments to illustrate that it’s not a simple case of “we want it, therefore we should have it":

Argument: GB Railfreight have allowed their data to be opened up and nothing bad has happened to their trains
Counter-argument: That’s their decision, our business is different

Argument: It’s possible to work out who operates each train anyway, so you might as well make it all open
Counter-argument: We’ll require that the data is further obfuscated to make it wholly impossible to do so - possibly even having all our train movement data removed from the public feeds

Argument: WTT schedules are in the Working Timetable which Network Rail publish, so they should be un-obfuscated in the feeds
Counter-argument: Many of those WTT schedules don’t always run

If you, or anyone else, can come up with a robust argument *for* making all the freight movement data go out in the clear, I’m happy to get this issue re-addressed.



Peter

signature.asc

Juhani Pirttilahti

unread,
Oct 30, 2015, 7:04:29 PM10/30/15
to A gathering place for the Open Rail Data community, juhani.pi...@gmail.com
I agree that this is a complicated case of information sensitivity. Re-evaluation of the situation now might just backfire on us by the sudden death of freight data. Current approach can actually be a good balance between culturally different parties involved in the same boat.

In Finland, we don't do that 'public interest' test. Government data is public unless otherwise stated by law. Even then the data can be partially released when non-public parts have been redacted.

Some years ago Finnish State Railways (VR) leaked real-time GPS positions of their locomotives and EMUs/DMUs into the general public by accident (apparently because of server misconfiguration). They did not shut it down until almost two years later. I'm pretty sure they knew the whole time. They were actually using my services built over the data in their day to day operations. During that time, rail data became quite popular in the community and the industry. But they shut it down. By that time, the ship had already turned around and the implementation of the open data policy of the Finnish Government was the final move towards open rail data. The Finnish Transport Agency has now opened its rail data service. It includes schedules, train compositions and real-time data for passenger and freight services. Similar limitations are in place for freight data in respect of the confidential business information. For freight, train categories and compositions are considered commercially sensitive and not published. But the name of the operator and reporting number are public. So whole Finland is in the open and nothing bad has happened.

In the UK, restrictions concerning reporting numbers could similarly be considered. The reporting number is already a matter of public record as it's used to idenfity a slot (or a path, whatever it's called) on the railway. Redacted track access agreements are published by the Office of Rail Regulation (ORR). They don't seem to edit reporting numbers or even service codes out of documents. This could be just another case of WTT argument.

I can't recommend all of it to go in the clear. I could say it would be nice, but I don't want to. Commercially sensitive information, such as the train category, should be considered confidential. Only the "freeing" of train reporting numbers makes some sense, because they don't actually reveal that much. They just identify the slot to a train that runs from place A to place B. And at the same time, that could fix some minor annoyances we currently experience with the data feeds.

Juhani Pirttilahti

unread,
Nov 6, 2015, 2:36:52 AM11/6/15
to A gathering place for the Open Rail Data community, juhani.pi...@gmail.com
Hi,

I have now one example: Train 521Y44M805 that was activated on 05/11/2015 at 21.00.

It is the 22.00 Liverpool Street - Ipswich service 1Y44 and in the TD feed it was running as 164H. It's obfuscated because 1Y44 happens to be the running number also for the 15.28 Folkestone West - London Victoria (train uid P90494), a charter operated by DB Schenker (a FOC living in the shadows). The Ipswich train continued to use that obfuscated running number even though the charter train had already terminated at London Victoria several hours earlier (18.10) and the Ipswich train wasn't activated before 21.00. Maybe the system doesn't release obfuscated running numbers until after some time period.

I also looked up that 1E06 mentioned earlier is the 22.45 Willesden Prdc - Low Fell Royal Mail Terminal (train uid P90459), a DB Schenker operated parcel service. It wasn't running anywhere near the 6.50 Glasgow Central - London Kings Cross (train uid Y58099), 1E06 also. That happened also the next day, this time obfuscated to 170G, as it's designed to. The obfuscator can't really tell them two trains apart. Unfortunately some passenger trains are permanently affected by this feature.

By the way, this morning I've noticed that some London Underground services at Wimbledon use running numbers different than in the schedule. For example, 1I65 => 1I52, and 1I61 => 6I25. I've seen many similar occurrences in these days.


On Friday, 30 October 2015 15:07:04 UTC+2, Peter Hicks wrote:

Peter Hicks

unread,
Nov 6, 2015, 2:41:24 AM11/6/15
to Juhani Pirttilahti, A gathering place for the Open Rail Data community
Hi Juhani

When a train terminates, the obfuscation mapping should be removed - but I think it's hanging around for 24 hours for some reason.  However, the root cause is that the obfuscation code doesn't differentiate between two trains in different parts of the country, because it's reasonably tricky to handle different instances of the same train ID on different TDs in the existing architecture of the system.

I'm not holding my breath that this bug will ever get fixed I'm afraid.


Peter


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nigel Mundy

unread,
Nov 7, 2015, 9:30:51 AM11/7/15
to Peter Hicks, openraildata-talk, Juhani Pirttilahti

On about the obsfuciation saga again, NR already do publish a freely accessible set of (scheduled) freight/departmental services with headcodes, all be it in PDF form.

http://tinyurl.com/nwgl36k (e.g. book GA).

I also believe the TSDB is NR owned, the TOCs/FOCs are contributors to it, so shouldn't it be NR's sole call as to whether to anonymise or not?

Nigel.

Reply all
Reply to author
Forward
0 new messages