GTFS Merge Utility?

2,010 views
Skip to first unread message

Frumin, Michael

unread,
Jan 12, 2012, 10:53:05 PM1/12/12
to mtadevelop...@googlegroups.com
Devs,

OneBusAway, the open source software that powers MTA Bus Time, is configured primarily by loading up one or more GTFS files. As such, we are, internally, producing and feeding into it:

1. A slightly improved Staten Island GTFS
2. A B63-Only GTFS (by automatically paring back the Brooklyn GTFS)

Soon we will push the SI improvements back down into the main GTFS-production process at which point they will become part of the GTFS on the main developer data page.

In the meantime (in any case, really) we want to publish a Bus Time-only GTFS. To do that, we need to merge the 2 GTFS files I mentioned above. Saving us the googling and testing, is there a preferred utility for doing this out there somewhere?

Thanks,
Mike

Sunny

unread,
Jan 13, 2012, 4:46:36 PM1/13/12
to mtadeveloperresources
Have a look at this, even though I haven't used it:

http://code.google.com/p/googletransitdatafeed/wiki/Merge

Frumin, Michael

unread,
Jan 16, 2012, 1:44:35 PM1/16/12
to mtadevelop...@googlegroups.com
This appears to be focused on merging 2 GTFS for the same network but for different time periods. It blew up when I tried using it to merge 2 different GTFS for the same time period (i.e. staten island + B63)

Any other candidates?

Thanks,
Mike

Sunny

unread,
Jan 16, 2012, 2:24:11 PM1/16/12
to mtadeveloperresources
It shouldn't be insanely difficult to manually merge the B63 into the
SI feed, it's just one route's worth of trips/stop times/stops.

Brian Ferris

unread,
Jan 13, 2012, 1:38:01 AM1/13/12
to mtadevelop...@googlegroups.com
The main utility I know about is:


Let me know if that doesn't work for you.

Brian

Brian Ferris

unread,
Jan 16, 2012, 3:29:53 PM1/16/12
to mtadevelop...@googlegroups.com
Ok how about this.  I extended the OBA GTFS transformer tool to accept multiple input feeds, still writing one output feed.  Assuming the following is true for your feed:

1) Agency, route, and stop that should match between the feeds should have the same ids.
2) Trips, shapes, and calendar service ids should be unique between feeds.

Then this should work.  I used it to merge the Staten Island + Manhattan feed as an example.  Check out the updated documentation at:


Something like:

java -jar gtfs-transformer-cli.jar --overwriteDuplicates feedA feedB combinedFeed

Let me know what happens.

Brian

Frumin, Michael

unread,
Jan 24, 2012, 11:14:39 AM1/24/12
to mtadevelop...@googlegroups.com

Brian, thanks.

 

I tried it, and it looks like it’s mostly working.  The GTFS I was combining was:

https://s3.amazonaws.com/MTABusTime/google_transit_staten_island.zip (which we are slightly modifying for MTA Bus Time as compared to the one posted on the MTA web site, for now)

https://s3.amazonaws.com/MTABusTime/gtfs-b63.zip (itself a reduced version of the Brooklyn GTFS)

 

and it resulted in https://s3.amazonaws.com/MTABusTime/gtfs_MTABusTime.zip

 

which you should be able to run through the GTFS validator and see the same problems I did, namely errors and warnings regarding duplicate ID’s/rows in calendar.txt and calendar_dates.txt

 

Thoughts?

 

Thanks,

Mike

 

 

Michael Frumin

Systems Engineering Manager

MTA Bus Customer Information Systems

2 Broadway, 27th Floor

o: 646-252-1117

c: 646-370-0388

mfr...@mtahq.org

Ken Conaway

unread,
Jan 27, 2012, 9:16:15 PM1/27/12
to mtadevelop...@googlegroups.com
I've always been a little confused about the calendar.txt service IDs. I know the first part is when the current pick is effective but don't get the last two letters. I can't distinguish a weekday service ID from a Saturday or Sunday. Is there any explanation on what the letters mean?

Ken

David Turner

unread,
Jan 31, 2012, 5:04:40 PM1/31/12
to mtadevelop...@googlegroups.com

It is not a good idea to derive anything at all from service ids; the
dates that a service runs on are described by the calendar.txt and
calendar_dates.txt files.

That said, if you really want to know, here's a list of all of the
letters and what they mean:
https://github.com/camsys/onebusaway-nyc/blob/master/onebusaway-nyc-transit-data-federation/src/main/java/org/onebusaway/nyc/transit_data_federation/bundle/tasks/stif/model/ServiceCode.java

There are two letters because the MTA's service day sometimes starts
before midnight, which GTFS doesn't support. So, sometimes trips from
the next day need to be included in the previous day.

Reply all
Reply to author
Forward
0 new messages