Archive of GTFS Realtime?

244 views
Skip to first unread message

Shawn Connor

unread,
Apr 25, 2016, 9:53:01 AM4/25/16
to MBTA Developers
Hello MBTA Developers!
For a data science class project, my group and I are interested in applying some machine learning techniques to MBTA data to determine what "typical" performance looks like for the subways system, then combining that with other related transportation data sets to explore relationships therebetween, E.g., Does Hubway ridership increase when trains are running late? How "bad" are impacts that the MBTA releases (or doesn't release) advisories about?

We don't actually need very much data compared to the overall realtime feed. A few months of records of arrivals by stop_id for one subway line (Red or Green for preference) seems about right. In concept, I think it's the same date you'd need to build a Marey diagram (like the ones in the excellent http://mbtaviz.github.io/). If I'm reading the MBTA statistics correctly, it seems like the "final" desired data set would be on the order of 10,000 records a day for the Red Line.

Does anyone know if this kind of archive is available anywhere?

Thanks in advance for the help,
Shawn


Developer at MBTA

unread,
Apr 25, 2016, 12:01:41 PM4/25/16
to MBTA Developers

Hi Shawn,

 

Great timing! We actually just released a set of performance data API calls that should provide you with the data you’re looking for. You can look at the documentation here<http://realtime.mbta.com/Portal/Content/Documents/MBTA-realtime_Performance_APIDocumentation_v0_9_1_2016-04-14_public.pdf>. The “headways” call provides both the time a train arrived at a station and the time since the last train, and the traveltimes call would give you a sense of if the system is running slower than normal. If you have any questions, please feel free to continue the discussion here.

 

Sincerely,

developer@mbta

Jeff Cunningham

unread,
Apr 26, 2016, 11:12:04 PM4/26/16
to MBTA Developers
This is great.  I'm actually working with Shawn on this and we're just pulling down a data set today.  We seem to be unable to pull data from some traveltimes data from the terminal stations.  For example, nothing is returned for Alewife to Davis or vice-versa:
http://realtime.mbta.com/developer/api/v2.1/traveltimes?api_key=wX9NwuHnZU2ToO7GmGR9uw&format=json&from_stop=70064&to_stop=70061&from_datetime=1453712400&to_datetime=1454317140

Are we missing something?

Thanks,
Jeff

Developer at MBTA

unread,
Apr 27, 2016, 10:00:36 AM4/27/16
to MBTA Developers
Hi Jeff,

We don't currently support terminals in the data for some technical reasons that we're working on resolving right now. All other stations should be present though. If you want some context for the data you're getting, check out this explainer document: http://realtime.mbta.com/Portal/Content/Documents/Interpreting_mbta_performance_API_output_2016-04-26.pdf

Sincerely,
developer@mbta

On Monday, April 25, 2016 at 9:53:01 AM UTC-4, Shawn Connor wrote:

Shawn Connor

unread,
Apr 27, 2016, 10:03:38 AM4/27/16
to MBTA Developers
Is there a record anywhere of when stop numbers change? I had the same problem when I tried using December 2015's GTFS schedule, and it turned out I was getting empty responses because the stations I was trying to get had different numbers. I switched to the GTFS file from March and almost everything worked, though we still have the problem Jeff describes.

Developer at MBTA

unread,
Apr 27, 2016, 10:20:05 AM4/27/16
to MBTA Developers
Hi Shawn,

The stop_id values in our GTFS feed do not change very often, in general. In fact, we would be interested to hear which specific stops you had this problem with. However, you should always use the most recent GTFS file when interpreting outputs from the MBTA-realtime API. To make sure you have the most current GTFS file, check the feed_info.txt file as described on our GTFS page under "Getting the File." For historical data, you could check out our new archive of GTFS files, which is described under "GTFS Archive" on the same page.

Another way to get a list of stop_ids is to use the "stopsbyroute" API call, described in our API documentation.

Sincerely,
developer@MBTA

Shawn Connor

unread,
May 29, 2016, 8:11:21 PM5/29/16
to MBTA Developers
In case anyone is curious, we published our final project (with links to our R code) here:

Kris K

unread,
Mar 16, 2018, 10:11:26 AM3/16/18
to MBTA Developers
Hi Shawn,
I am interested in working on a Data Science project. I would like to use the MBTA API.
I am interested in taking a look at your project. However, the above link is dead. If you have any new location for checking your project, can you please share it.

Thank you,
Kris

Jeff Cunningham

unread,
Mar 19, 2018, 5:57:19 PM3/19/18
to MBTA Developers
Reply all
Reply to author
Forward
0 new messages