Downloading the full daily schedule

473 views
Skip to first unread message

Benjamin Ricketts

unread,
Apr 21, 2013, 3:20:40 PM4/21/13
to openrail...@googlegroups.com
Hi all,

I've only recently started looking into the various data feeds, so please excuse me if this post is something I should already know!

I'm trying to download the the full daily schedule and any updates required from day to day using some form of scheduled task.

I've created a basic HTTP request to the following URL:


The request includes my username and password via a HTTP authorization header. 

When I get a response back I'm always getting the content of a login page (https://datafeeds.networkrail.co.uk/ntrod/spring_security_login) rather than being redirected to the Amazon S3 file.

Just just a side note, I can access the files after visiting that URL manually and typing in my credentials.

I guess my questions are:
  • Can the schedule files be downloaded in this way?
  • Is the HTTP authorization header what I need to use in order to achieve this download?
Many thanks!

Ben.

Peter Hicks

unread,
Apr 21, 2013, 3:37:54 PM4/21/13
to openrail...@googlegroups.com
Hi Ben


On 21/04/13 20:20, Benjamin Ricketts wrote:
When I get a response back I'm always getting the content of a login page (https://datafeeds.networkrail.co.uk/ntrod/spring_security_login) rather than being redirected to the Amazon S3 file.

Just just a side note, I can access the files after visiting that URL manually and typing in my credentials.

I guess my questions are:
  • Can the schedule files be downloaded in this way?
  • Is the HTTP authorization header what I need to use in order to achieve this download?
Yes, you can automate them - but I have to admit, the cookie-based authentication isn't the simplest thing in the world to get around.  I have an improvement request logged to make the redirect URLs accept HTTP authentication as well as cookie-based.

From memory, I think you first need to submit an HTTP post request with your username and password to /ntrod/j_spring_security_check, store the cookie that gets set when you authenticate successfully, then use this cookie when you request the redirect URL.

Paul Kelly (on this list) wrote a script to download the weekly ATOC data automatically - see https://groups.google.com/forum/?fromgroups=#!search/automatically$20getting$20the$20atoc/openraildata-talk/CZw2CF_yI-w/6GTY0pFgfEoJ - although I'm not sure if anyone's succeeded in downloading the JSON schedule data yet.

If nobody has, I'll write up some instructions this week.

Cheers,


Peter


Benjamin Ricketts

unread,
Apr 21, 2013, 4:30:51 PM4/21/13
to openrail...@googlegroups.com
Hi Peter,

Thanks for such a quick response. Had a feeling it wasn't supporting HTTP authentication, I'll have a play around and see what I can do.

Thanks!

Michael Pritchard

unread,
Apr 21, 2013, 4:37:24 PM4/21/13
to Benjamin Ricketts, openrail...@googlegroups.com
the following works in c# so it is possible, hopefully this may help with whatever language you are using:


WebRequest webRequest = WebRequest.Create(requestUri);
string str = Convert.ToBase64String(Encoding.ASCII.GetBytes(username + ":" + password);
webRequest.Headers[HttpRequestHeader.Authorization] = "Basic " + str;
byte[] buffer = new byte[4096];

using (WebResponse response = webRequest.GetResponse())
            {
                using (Stream responseStream = response.GetResponseStream())
                {
                    using (MemoryStream memoryStream = new MemoryStream())
                    {
                        int count;
                        int counter = 0;
                        do
                        {
                            Console.WriteLine("Downloading bytes {0}", counter * 4096);
                            count = responseStream.Read(buffer, 0, buffer.Length);
                            memoryStream.Write(buffer, 0, count);
                            counter++;
                        }
                        while (count != 0);
                        byte[] bytes = memoryStream.ToArray();
                        System.IO.File.WriteAllBytes(filePath, bytes);
                    }
                }
            }

requestUri would be https://datafeeds.networkrail.co.uk/ntrod/CifFileAuthenticate?type=CIF_HE_TOC_FULL_DAILY&day=toc-full, and filePath would be the local path to save to. You then need to de-compress the file


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To post to this group, send an email to openrail...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
--------------------------------------------------------------
Michael Pritchard
Web     :: http://www.blueghost.co.uk
Email   :: blueg...@gmail.com
--------------------------------------------------------------

Benjamin Ricketts

unread,
Apr 21, 2013, 5:21:00 PM4/21/13
to openrail...@googlegroups.com
Hi Michael,

I just happen to be using c# and that example works a treat. I was previously using something like:

request.Credentials = new NetworkCredential("username", "password");

Removed that and used your example and I've just got the file to download.

Thanks for the help. 

Jules

unread,
May 5, 2013, 1:04:36 PM5/5/13
to openrail...@googlegroups.com
I am too struggling with this but do not have the c# option available to me so trying the wget with basic authentication route.


Mostly this reports:

Resolving datafeeds.networkrail.co.uk... 79.125.104.87
Connecting to datafeeds.networkrail.co.uk|79.125.104.87|:443... connected.
HTTP request sent, awaiting response... 503 Service Unavailable: Back-end server is at capacity
17:59:24 ERROR 503: Service Unavailable: Back-end server is at capacity.

However, occasionally it connects but reports:

Resolving datafeeds.networkrail.co.uk... 79.125.104.87
Connecting to datafeeds.networkrail.co.uk|79.125.104.87|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Resolving nr-datafeed-cif.s3.amazonaws.com... 178.236.4.122
Connecting to nr-datafeed-cif.s3.amazonaws.com|178.236.4.122|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
17:53:41 ERROR 400: Bad Request.


I know the feed is working as I can manually login and download the file through a browser.

Any help would be greatly appreciated.

Thanks, Jules

Peter Hicks

unread,
May 5, 2013, 1:16:35 PM5/5/13
to openrail...@googlegroups.com
Hi Jules


On 05/05/13 18:04, Jules wrote:
I am too struggling with this but do not have the c# option available to me so trying the wget with basic authentication route.


Mostly this reports:

Resolving datafeeds.networkrail.co.uk... 79.125.104.87
Connecting to datafeeds.networkrail.co.uk|79.125.104.87|:443... connected.
HTTP request sent, awaiting response... 503 Service Unavailable: Back-end server is at capacity
17:59:24 ERROR 503: Service Unavailable: Back-end server is at capacity.
There was a problem with the service around that time - are you still getting the 503 errors now?


Peter

Jonathon Hurley

unread,
May 5, 2013, 1:54:51 PM5/5/13
to openrail...@googlegroups.com
Jules,

I've recently been experimenting with the same thing as you, namely trying to use wget to download the schedules.  I managed to get to the same point you did, but using --save-cookies and --load-cookies to simulate the login page.

Seemingly, when using wget, the &day=toc-full part of the query string isn't passed on, and therefore an invalid filename is sent to Amazon S3.  This shows up in the error - https://nr-datafeed-cif.s3.amazonaws.com/CIF_ALL_FULL_DAILY%2Fnull.json?Expires=... should read https://nr-datafeed-cif.s3.amazonaws.com/CIF_ALL_FULL_DAILY%2Ftoc-full.json?Expires=...

I haven't worked out how to get around this yet.  Possibly there are some wget settings which need changing but I'm not sure which.

Regards

Jonathon


Jules

unread,
May 6, 2013, 6:01:04 AM5/6/13
to openrail...@googlegroups.com
Now getting 

ERROR 400: Bad Request

Jules

unread,
May 6, 2013, 6:03:35 AM5/6/13
to openrail...@googlegroups.com
Jonathan,

Ah good spot!

Now the service seems to be operating I'm stuck at the same point as you.

Can't see anything in the man pages of wget to help with this so it might be something in the way the redirection is coded.

Regards,
Jules

Jules

unread,
May 8, 2013, 4:40:19 PM5/8/13
to openrail...@googlegroups.com
Not sure if anything has changed but I thought I'd give curl a try rather than wget and after numerous parameter combinations, success!

curl -L -u username:password -o file.gz 'https://datafeeds.networkrail.co.uk/ntrod/CifFileAuthenticate?type=CIF_ALL_FULL_DAILY&day=toc-full'

Now all I've got to do is get a 2GB json CIF file parsed somehow...

Jules

Jonathon Hurley

unread,
May 10, 2013, 7:00:29 AM5/10/13
to openrail...@googlegroups.com
Fantastic, thanks Jules.

Interestingly the apostrophes seem vitally important in that command line - it wouldn't work without them.



Jonathon

Chris Franks

unread,
Jan 29, 2014, 10:50:59 AM1/29/14
to openrail...@googlegroups.com
Hi, sorry to reopen this thread. I'm trying to get Michael's example code working using my credentials, & am getting a 403 error on the GetResponse() line. Given that it changes to a 401 when I use a bad password & that I can login & download the files manually, I'm stumped.

I'm assuming that the username is my email address as per the login page & not something daft here. Has anyone got any ideas? Ta in advance.

Chris

Peter Hicks

unread,
Feb 3, 2014, 3:58:32 AM2/3/14
to Chris Franks, openrail...@googlegroups.com
Hi Chris

On 29 Jan 2014, at 15:50, Chris Franks <franks...@gmail.com> wrote:

> Hi, sorry to reopen this thread. I'm trying to get Michael's example code working using my credentials, & am getting a 403 error on the GetResponse() line. Given that it changes to a 401 when I use a bad password & that I can login & download the files manually, I'm stumped.
>
> I'm assuming that the username is my email address as per the login page & not something daft here. Has anyone got any ideas? Ta in advance.

Yup, your username is your email address.

Have you subscribed to the right timetable feed in the UI?


Peter

Chris Franks

unread,
Feb 3, 2014, 6:56:37 AM2/3/14
to openrail...@googlegroups.com
Thanks for replying! It should be the correct feed, & as I say I can download it through a browser using the same URI as the download code.

Peter Hicks

unread,
Feb 3, 2014, 9:41:34 AM2/3/14
to Chris Franks, openrail...@googlegroups.com

On 3 Feb 2014, at 11:56, Chris Franks <franks...@gmail.com> wrote:

> Thanks for replying! It should be the correct feed, & as I say I can download it through a browser using the same URI as the download code.

Have you tried it with quotes - “” - rather than apostrophes - ‘’ - ?


Peter

Chris Franks

unread,
Feb 3, 2014, 11:56:39 AM2/3/14
to openrail...@googlegroups.com
I'm not sure what you mean; there are not quotes or apostrophes in this code sample, nor in my code.

Chris Franks

unread,
Feb 6, 2014, 6:13:22 AM2/6/14
to openrail...@googlegroups.com
Hi all!

I found that the issue was own to the way that .NET prior to v4.5 handles unescaped characters in URI strings. Given that the issue is in the Response.ResponseUri & that MS have deprecated & overridden the relevant properties, I've changed .NET versions.

Cheers!
Chris
Reply all
Reply to author
Forward
0 new messages