Timothy Wu
unread,Aug 27, 2012, 2:54:37 AM8/27/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to graph...@googlegroups.com
Hi,
I'm reading the code in GraphServer and I am still unfamiliar with it, so if I'm missing something please correct me.
In def gtfsdb_to_scheduled_edges() of gdb_import_gtfs.py, the trip bundles are compiled. The outer for loop loops the trip bundles first, with the inner for loop loops the service_ids. Then the call is made to bundle_to_boardalight_edges(). In there bundle is queried for stop_time information. (stop_time_bundles = bundle.stop_time_bundles(service_id)) Isn't that terribly inefficient?
On the GTFS I checked (AC transit, Alameda County, CA, USA), within trips.txt, a trip_id already correspond to exactly one service ID only. Is that true for all GTFS? Well, since trip_id is database unique in trip_id.txt that should be the case if I'm thinking correctly. Then, if you have several service_ids (in the case of AC transit there are 9), most queries to stop_time_bundle is going to return an empty list and get rejected by the check in bundle_to_boardalight_edges():
if len(stop_time_bundles)<2:
return
Considering that there are 11,853 records in trips.txt, that service_id are not indexed (well just 9 unique service, plus a few additional ones from calendar_dates.txt indexing probably doesn't help) and also that it is a JOIN between trip table and the relatively huge stop_time table (565,751 here), isn't it a lot more efficient to first loop the service, then to check for trip bundles with that service?
Timothy