GTFS-ride aggregation and GTFS versioning

Aaron Antrim

unread,

Oct 6, 2018, 5:22:37 PM10/6/18

to gtfs...@googlegroups.com

We need to think about the implications of GTFS-ride for GTFS data publishing: How do we know differentiate between when a new GTFS dataset (aka feed version) corrects an error in previously published data vs. describes an upcoming service change?

We need to think about how GTFS-ride data is aggregated over time: Is there a one-to-one relationship between GTFS-ride datasets and GTFS iterations? Or might GTFS-ride datasets encompass multiple GTFS datasets? Does it make sense to make any recommendation at all, or let GTFS-ride software decide how to handle this?

More discussion in this GitHub issue:

https://github.com/ODOT-PTS/GTFS-ride/issues/25

--
Aaron Antrim
President & Founder, Trillium
503.567.8422 ext. 3

Josie Kressner

unread,

Oct 8, 2018, 7:44:01 PM10/8/18

to GTFS-ride

I think it is important to make a recommendation.

Errors should be versioned differently than new ridership data (always new time period, sometimes new service but not required).

John Levin

unread,

Oct 14, 2018, 9:47:36 PM10/14/18

to GTFS-ride

The issue of multiple, changing GTFS file sets over time is a challenging one.

At Metro Transit (Minneapolis-St. Paul), we publish a new file set every week that contains the next seven weeks of schedules. So a month of GTFS-ride detail data would need to be matched to 4 or 5 different GTFS file sets (but only to a portion of each of those file sets). That gets both large and complex pretty quickly.

Another approach would be to think about the original, public information GTFS file sets as something completely different from the historical, data matching GTFS file sets. In other words, we don’t HAVE to match GTFS-ride to the public GTFS files. We can instead, after the fact, create a special GTFS file set just for matching the GTFS-ride data. We would carefully craft that GTFS file set to contain all schedules needed by the GTFS-ride data, but not a lot more than that.

On the plus side, this will make the use of the GTFS-ride data easier. I would presume that most of the problems that Aaron identifies could be avoided.

On the negative side, this will make the generation of the complete GTFS-ride + GTFS schedule data even harder. Are there existing tools that consume multipe GTFS file sets and output them as a single combined set? That would be a big help.