How can I extract certain data from GTFS data?

69 views
Skip to first unread message

Shervin Ataeian

unread,
Oct 1, 2020, 8:35:40 AM10/1/20
to TransLink Developers

I am doing a research project and I need 

- average travel times between stops of a bus network

- Max and Min headway

Is it possible to extract these from GTFS data? If yes, how? I can open the data in Excel but I do not know how to extract what I need.

I would appreciate it if someone could help me. Thanks

Christian Paul

unread,
Oct 1, 2020, 8:12:46 PM10/1/20
to TransLink Developers
Yes, but only according to the timetables. If you need actual data from buses you'll need to continuously download the GTFS live data or ask Translink for this data.

For the timetable analysis:
You'll need to combine values from multiple tables.
From stop_times you'll want to look at arrival_time, departure_time and stop_id. You may also want to filter some entries based on pickup_type and drop_off_type.
From stops you may want to identify bus stops by their stop_code (that's the one on Translink's signs) but to join the table with stop_times use stop_id.

Depending on your research question you may need more tables to identify routes (one entry per bus number) and trips (one entry per stop to stop ride) and the calendar ones for filtering out weekends/weekdays/holidays.

Christian Paul

unread,
Oct 1, 2020, 9:07:31 PM10/1/20
to TransLink Developers
I suggest to be very specific about the question, ideally making your final analysis reproducible by how specific you asked.

What's the date range or are you looking at one specific date?
Do you want to have a look at holidays or exclude them?
Are you looking at a "usual" weekday or at a day of the weekend?

And,
How do you define average travel time?
As a pedestrian coming to a stop at a specific time, having to wait for the next bus or starting at the time of departure?
The average of what? Every minute between 8 AM and 10 PM or every time a bus departs between 6 AM and 7 PM?
Will accessibility features for wheelchair users or visually impaired people be considered?
It's ok to opt for a simpler query, but (especially if this is for academic research) the reasoning should be taken note of.

How do you define headway?
The time between two buses of the same line and direction depart at a specific stop during their times of operation?
What about two bus lines serving almost the same stops (e.g. B-Lines)? Shall they be combined if the start and stop is served by both? Should a faster connection be preferred for long trips even if the bus departs later?
Do you want the night to be handled?
Do you want to need night buses merged with their daytime bus line? Those are listed separately and may not service every stop.

e.g.
* I want to look at all bus connections which at some point travel through Burnaby
* I want to look at a usual, non-holiday Wednesday.
* I want them distinguished by bus line, first stop (name) and final stop (name and stop code).
* I want the average travel times from the first stop to each following stop.
  * Or, I want the average travel time from each stop to the the following stop.
  * Or, I want the average travel time from every stop with a transfer option to the next stop with a transfer option.
* Separately, I want the min and max headway of each stop per bus line and direction between 8 AM and 10 PM.
  * Or, I want the the min and max headway of each stop per bus line and direction during operations. Furthermore, I want the first and last time the stop is serviced.
* I want this to be run on the dataset of 2020-09-25.
  * Or, I want this as a script so that it can be run on multiple datasets.

Additionally, it would be great to get an idea of what file formats you can
* I want the data as an Excel sheet:
  * Sheet 1: average_travel_times
    * Rows: route_short_name, route_long_name, start_stop_code, start_stop_name, end_stop_code, end_stop_name, …

Shervin Ataeian

unread,
Oct 8, 2020, 9:18:31 PM10/8/20
to TransLink Developers
Thank you so much for your help.

Actually, I am not looking for a specific date (or even a specific network). I need travel times between certain stops on working days or weekends by hour in a large network (with at least 90 routes). So, it seems that I need GTFS real-time data containing actual arrival and departure time of vehicles at stops so that I can calculate travel times directly. However, I am not sure where I can find this data. I have tried several websites but I do not seem to be able to find any real GTFS real-time data files. The websites just refer to each other and the more I click, the more confused I get. The static data can easily be found but I have had no luck with real-time data. I would really appreciate it if you would help me with this problem.
Reply all
Reply to author
Forward
0 new messages