Querying trip_ids based on service_id

20 views

Skip to first unread message

Timothy Wu

unread,

Aug 27, 2012, 2:54:37 AM8/27/12

to graph...@googlegroups.com

Hi,

I'm reading the code in GraphServer and I am still unfamiliar with it, so if I'm missing something please correct me.

In def gtfsdb_to_scheduled_edges() of gdb_import_gtfs.py, the trip bundles are compiled. The outer for loop loops the trip bundles first, with the inner for loop loops the service_ids. Then the call is made to bundle_to_boardalight_edges(). In there bundle is queried for stop_time information. (stop_time_bundles = bundle.stop_time_bundles(service_id)) Isn't that terribly inefficient?

On the GTFS I checked (AC transit, Alameda County, CA, USA), within trips.txt, a trip_id already correspond to exactly one service ID only. Is that true for all GTFS? Well, since trip_id is database unique in trip_id.txt that should be the case if I'm thinking correctly. Then, if you have several service_ids (in the case of AC transit there are 9), most queries to stop_time_bundle is going to return an empty list and get rejected by the check in bundle_to_boardalight_edges():

if len(stop_time_bundles)<2:
return

Considering that there are 11,853 records in trips.txt, that service_id are not indexed (well just 9 unique service, plus a few additional ones from calendar_dates.txt indexing probably doesn't help) and also that it is a JOIN between trip table and the relatively huge stop_time table (565,751 here), isn't it a lot more efficient to first loop the service, then to check for trip bundles with that service?

Timothy

Timothy Wu

unread,

Aug 27, 2012, 3:15:17 AM8/27/12

to graph...@googlegroups.com

I might clarify to myself (I am not entirely clear myself) and others reading the previous post is that I don't mean to just swap the for loops, that would just be the same number of queries. I mean during the gtfsdb.py compile_trip_bundles() which gives 456 bundles in my case (456 unique stop times patterns?) if an associated service_ids is also in the bundle (querying service_id from trip_id should be fast), wouldn't that help speed it up?

Again, I'd be appreciated if someone tells me what I might be getting wrong about. Thanks

Reply all

Reply to author

Forward

0 new messages