Timetable Updates - A Vote

117 views
Skip to first unread message

Peter Hicks

unread,
Feb 10, 2015, 8:08:23 AM2/10/15
to openraildata-talk
All,

After some investigation, I think I’ve found a way forward to the problem of missing timetable updates.

At the moment, the process to generate these files runs at 0001, processes a CIF update from Network Rail, creates ‘update’ and ‘full’ files, then uploads them all to Amazon S3.

I’m proposing:

  • Starting the generation at 0200
  • Uploading the ‘update' files as soon as they’re generated (possibly earlier than they are now, but hopefully no later)
  • Creating the ‘full’ files and uploading them as soon as they’re generated

I’m anticipating this will fix the problem of late reception of data from Network Rail resulting in *no* data being produced.

For those of you two use the JSON feed - is this an acceptable?


Peter

signature.asc

petermount

unread,
Feb 10, 2015, 8:25:37 AM2/10/15
to openrail...@googlegroups.com
I'm presuming this will affect just the JSON feeds & not the CIF ones?

Going by the other thread where JSON was missing for recent mornings, the CIF was there.

Peter

Peter Hicks

unread,
Feb 10, 2015, 8:41:20 AM2/10/15
to petermount, openrail...@googlegroups.com
Hi Peter

On 10 Feb 2015, at 13:25, petermount <peter...@gmail.com> wrote:

> I'm presuming this will affect just the JSON feeds & not the CIF ones?
>
> Going by the other thread where JSON was missing for recent mornings, the CIF was there.

Correct - it’s just for the JSON feeds, as they’re processed. The CIF feeds have only a very light layer of processing to remove some freight data.


Peter


signature.asc

Kevin Fullerton

unread,
Feb 10, 2015, 12:08:26 PM2/10/15
to openrail...@googlegroups.com
One request if there's a change to the JSON generation/processing ... Is there a reason why both the full and daily updates file have this key in the header (and can it be changed to the update files have "update" rather than "full" in them?

"Metadata":{"type":"full","sequence":965}

Many thanks

Kevin



Peter


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send an email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phil Wieland

unread,
Feb 11, 2015, 3:34:22 AM2/11/15
to openrail...@googlegroups.com
Sounds good to me.

Phil

Michael Pritchard

unread,
Feb 11, 2015, 4:24:00 AM2/11/15
to openraildata-talk
ok from me

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send email to openrail...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--------------------------------------------------------------
Michael Pritchard
Web      :: http://www.blueghost.co.uk
GMail    :: blueg...@gmail.com
--------------------------------------------------------------

Chris Bailiss

unread,
Feb 11, 2015, 4:24:45 AM2/11/15
to openrail...@googlegroups.com, peter...@gmail.com

Hi Peter

I consume the CIF file now, so apologies for taking your discussion on a tangent for a moment.

>> The CIF feeds have only a very light layer of processing to remove some freight data. 

If I recall, late last year there were some occasions where the CIF daily update file failed to be produced and also a rather prolonged period where the weekly full file was repeatedly missed.  Again if I recall correctly, this was all due to large timetable updates being processed back at NR which meant the whole export process for all the CIF file users (i.e. not just NROD) took longer, meaning the NROD CIF file arrived later than the time that the NROD CIF file processing job was scheduled for.  As a result, we were a day behind on the daily file / there was only a very outdated full file (several weeks old) available for a time.

If I recall, you were going to speak to NR about the possibility of moving the NROD file higher in the export job.  Also, there was talk of making the NROD CIF processing a bit more flexible, to work even if the NR file was late.

I was wondering where this got to and what changes were made in the end?

Thanks

Chris

Peter Hicks

unread,
Feb 11, 2015, 4:44:08 AM2/11/15
to Chris Bailiss, openrail...@googlegroups.com, peter...@gmail.com
Hi Chris

On 11 Feb 2015, at 09:24, Chris Bailiss <cbai...@gmail.com> wrote:

> If I recall, late last year there were some occasions where the CIF daily update file failed to be produced and also a rather prolonged period where the weekly full file was repeatedly missed. Again if I recall correctly, this was all due to large timetable updates being processed back at NR which meant the whole export process for all the CIF file users (i.e. not just NROD) took longer, meaning the NROD CIF file arrived later than the time that the NROD CIF file processing job was scheduled for. As a result, we were a day behind on the daily file / there was only a very outdated full file (several weeks old) available for a time.

That’s correct - the CIF was processed “at midnight”, which naturally doesn’t cater for it being received from NR after midnight. I’m told it’s due to the architecture of the system and assumptions that were valid a couple of years ago.

> If I recall, you were going to speak to NR about the possibility of moving the NROD file higher in the export job. Also, there was talk of making the NROD CIF processing a bit more flexible, to work even if the NR file was late.
>
> I was wondering where this got to and what changes were made in the end?

There’s only so much flexibility NR have to prioritise the Open Data CIF over others - it’s currently at the highest available priority group, because there are other operational systems which need to receive data after TRUST does (such as automatic route setting systems in signal boxes). It’s hard to tell if this priority change has made a difference - the times that CIF files are produced depends on a number of factors we don’t have any visibility of, such as other users who want to receive a full extract that night, the volume of schedule updates etc.

One of the other changes, which I believe is in testing at the moment, is to make the CIF processing take place event-driven, rather than time-driven.

Related, one of the things I was mulling over was having the weekly CIF generated on a Sunday night, if that’s possible. I’m guessing there are fewer schedule updates taking place during Sunday than other days of the week, so it may spread the load of generating a full extract out to a day when it’s likely to get executed quicker.

Some other things that could (not necessarily will, as they’re less trivial code changes) be done are:

* Automatically FTP-ing CIF files out to users who are capable of receiving them automatically, so the data is received when it’s available
* Calling a URL on a client system to notify that the CIF file is ready
* Sending a message on a ‘notification’ queue when the CIF file is ready


Peter

signature.asc

Martin Swanson

unread,
Feb 11, 2015, 5:04:49 AM2/11/15
to Peter Hicks, Chris Bailiss, openrail...@googlegroups.com, peter...@gmail.com
Peter

The third option you describe - sending a message on a dedicated topic when a file is available - would be the best option from a design perspective.

It means we have an event driven process consistent with the other feeds.

Martin
> --
> You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
> To post to this group, send an email to openrail...@googlegroups.com.

petermount

unread,
Feb 11, 2015, 5:09:43 AM2/11/15
to openrail...@googlegroups.com, cbai...@gmail.com, peter...@gmail.com

Related, one of the things I was mulling over was having the weekly CIF generated on a Sunday night, if that’s possible.  I’m guessing there are fewer schedule updates taking place during Sunday than other days of the week, so it may spread the load of generating a full extract out to a day when it’s likely to get executed quicker.
 
That might be an idea. Recently Friday's & Saturdays have been busier. Just looking at this weekends logs, file sizes were - NB not schedules but line count:

Fri 5 189367
Sat 6 465599
Sun 7 9131

So I'd agree, Sunday would be better.


Some other things that could (not necessarily will, as they’re less trivial code changes) be done are:

  * Automatically FTP-ing CIF files out to users who are capable of receiving them automatically, so the data is received when it’s available
  * Calling a URL on a client system to notify that the CIF file is ready
  * Sending a message on a ‘notification’ queue when the CIF file is ready

I'd go with the notification option at first as that one would be easier to implement on your end. For most of us who are event driven anyhow (i.e. we have our own brokers be it ApacheMQ or RabbitMQ in my instance) it would fit in fine.

Also, later on you could then implement either of the first two based on that queue, making that side easier. Some of us could even implement those (probably the URL one) if necessary.

Peter

Chris Bailiss

unread,
Feb 11, 2015, 5:32:33 AM2/11/15
to openrail...@googlegroups.com, cbai...@gmail.com, peter...@gmail.com
Hi Peter

Thanks for the reply.

>> There’s only so much flexibility NR have to prioritise the Open Data CIF over others

Makes sense.  I wasn't saying this should have been done.  I just remember it coming up in the discussion so wondered how it panned out in the end. 

>> One of the other changes, which I believe is in testing at the moment, is to make the CIF processing take place event-driven, rather than time-driven. 

Great.  Event driven is far more preferable than time driven.

>> Related, one of the things I was mulling over was having the weekly CIF generated on a Sunday night, if that’s possible.

The night of the week makes no difference to me from a consumer point of view, so whatever is more reliable seems sensible.  I assume you would then receive a full file from NR on the Sunday too and apply the same light processing.

>> Some other things that could (not necessarily will, as they’re less trivial code changes) be done are: 
>> * Automatically FTP-ing CIF files out to users who are capable of receiving them automatically, so the data is received when it’s available 
>> * Calling a URL on a client system to notify that the CIF file is ready 
>> * Sending a message on a ‘notification’ queue when the CIF file is ready

+1 also for the notification message.  I assume it would be consumable in the same way as the other feeds.  Also, sounds easier to implement for you.  However, after such a notification is sent, it is possible many people would all try to download immediately and simultaneously.  I am not sure where you are serving the schedule files from, but if it is from a single VM, might that cause some load issues?  I have observed in the past, that when the messaging system is under stress (e.g. during one of the full/partial-outages), the schedule file downloads would be intermittently unavailable too, with the HTTP error stating the server was overloaded.  I guess schedule processing and downloading should be overnight, but if the schedule files and messaging are running off the same machine(s), then perhaps some care would be needed to ensure one can't knock the other over, e.g. in the event of a late schedule file arriving in the middle of the day (if this is even possible).

Chris

Peter Hicks

unread,
Feb 11, 2015, 5:40:53 AM2/11/15
to Chris Bailiss, openrail...@googlegroups.com, peter...@gmail.com

On 11 Feb 2015, at 10:32, Chris Bailiss <cbai...@gmail.com> wrote:

> I have observed in the past, that when the messaging system is under stress (e.g. during one of the full/partial-outages), the schedule file downloads would be intermittently unavailable too, with the HTTP error stating the server was overloaded. I guess schedule processing and downloading should be overnight, but if the schedule files and messaging are running off the same machine(s), then perhaps some care would be needed to ensure one can't knock the other over, e.g. in the event of a late schedule file arriving in the middle of the day (if this is even possible).

CIF and JSON files are served from Amazon S3 - to get one, you call a URL and authenticate, the server creates a one-time (think it’s valid for 5 minutes) URL on S3, and then sends an HTTP redirect back to the client with this URL.

The reason this is done is so there are stats on how many people are using the schedule feeds - if it were entirely open, it’d be really hard to tell who’s using what, so it becomes difficult to justify supplying the data.


Peter


signature.asc

Dave Butland

unread,
Feb 18, 2015, 2:25:15 PM2/18/15
to openrail...@googlegroups.com, cbai...@gmail.com, peter...@gmail.com
Peter, 

I've just rewritten my schedule update process because of the missing json files recently. I'm just testing the new CIF process and can consume the full file in just a few minutes, de-duplicating LO, LI, BS and BX records on the fly. This removes a lot of the duplication in the file and creates a much more sensible schedule structure that can be interrogated for changes. It's still not perfect but every little step forward helps. Processing the delta files should be almost instantaneous and I don't expect the full check every weekend to take more than a couple of minutes either. I'm rather proud of this code (and it's clean and tidy for once). 

I will use the old JAXB classes to spit out a JSON file in the current format you are using and compare against your full and delta feeds. Happy to share the code if it will be of some use to you. I'll stick it on github when I'm done. About time I set up my git repository anyway.

Dave. 
Reply all
Reply to author
Forward
0 new messages