Inconsistent ETag and Last-Modified headers for GTFS data

26 views
Skip to first unread message

James Synge

unread,
Jun 21, 2015, 2:45:28 PM6/21/15
to massdotd...@googlegroups.com
Developer at MBTA,

I'm writing a program that will be fetching http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip periodically in order to update the route/schedule info.

To avoid re-fetching the file when it has not changed I planned to use the Etag and/or Last-Modified headers from the previous full fetch to determine if the file has changed. But when I fetch it multiple times these values aren't consistent. It appears that there are 5 IIS 7.5 servers, each with its own copy of the zip file, with slightly different Last-Modified and Etag values:
Last-Modified: Thu, 18 Jun 2015 21:06:22 GMT
Etag: "143bf4a0aaad01:0"
Last-Modified: Thu, 18 Jun 2015 21:08:38 GMT
Etag: "c310cf2aaad01:0"
Last-Modified: Thu, 18 Jun 2015 21:08:27 GMT
Etag: "92b5d8ebaaad01:0"
Last-Modified: Thu, 18 Jun 2015 21:04:22 GMT
Etag: "1c6b659aaad01:0"
Last-Modified: Thu, 18 Jun 2015 21:05:23 GMT
Etag: "48ff07daaad01:0"
Could you (MBTA/MassDOT folks) fix this somehow?  For example, modify the script that copies the files so that it sets the same timestamp of each file after the copy.

Thanks, James

p.s. I'm also curious how other developers have dealt with this.

Charlie Dalsass

unread,
Jun 21, 2015, 4:54:00 PM6/21/15
to massdotd...@googlegroups.com
I'm just pulling the file and doing an md5sum on the contents, comparing against last md5 sum. This is probably more reliable than using Etags. 

Charlie

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelop...@googlegroups.com.
To post to this group, send email to massdotd...@googlegroups.com.
Visit this group at http://groups.google.com/group/massdotdevelopers.
For more options, visit https://groups.google.com/d/optout.

James Synge

unread,
Jun 21, 2015, 7:19:53 PM6/21/15
to massdotd...@googlegroups.com
That would be my inclination too, but one thing argues against this: file size (34MB)... complicated by the fact that I'm going to be trying to run this app in the free tier of Google App Engine, so I don't want to fetch it too often due to limited ingress quota, not that I've added up how much it might be; maybe it doesn't matter. And I'd like to do things "the right way" if possible, so if the MBTA developers are able to address this, I think we'd all benefit.

Oh, and the reason I noticed this: the 34MB file is 2MB too large to fetch in one request on app engine, which has a 32MB limit. Therefore I was writing some code to fetch it in two chunks, but I found that the Last-Modified and ETag values frequently didn't match up.

How often do you fetch the file?

Charlie Dalsass

unread,
Jun 21, 2015, 9:01:25 PM6/21/15
to massdotd...@googlegroups.com
I do that once per day. 

It's tricky breaking a download into parts like that. FYI: There is an HTTP "range request" which might help. See:


But seems like overkill and may be better to pay a little (I can't imagine it would be too much) - as your time is worth money too!

Charlie

Developer at MBTA

unread,
Jun 22, 2015, 5:10:36 PM6/22/15
to massdotd...@googlegroups.com, cha...@dalsass.mobi
James, you're correct that the file is hosted on multiple servers and that the last-modified dates will vary. This is one reason we also make feed_info.txt available as a separate download at http://www.mbta.com/uploadedfiles/feed_info.txt . You can download that file and compare it to the feed_info in the last GTFS you downloaded (comparing feed_version will suffice.) If it's different then that means there is a new GTFS to download. 

Sincerely,
developer@mbta
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelopers+unsubscribe@googlegroups.com.
To post to this group, send email to massdotdevelopers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelopers+unsubscribe@googlegroups.com.
To post to this group, send email to massdotdevelopers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelopers+unsubscribe@googlegroups.com.
To post to this group, send email to massdotdevelopers@googlegroups.com.

James Synge

unread,
Jun 22, 2015, 5:33:45 PM6/22/15
to massdotd...@googlegroups.com, cha...@dalsass.mobi
Excellent, thanks.

developer@mbta
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelop...@googlegroups.com.
To post to this group, send email to massdotd...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelop...@googlegroups.com.
To post to this group, send email to massdotd...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelop...@googlegroups.com.
To post to this group, send email to massdotd...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "MBTA Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to massdotdevelop...@googlegroups.com.
To post to this group, send email to massdotd...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages