Too bad nobody replied to this.
People changing their route_id between schedule updates is the single biggest problem for consuming software.
What better way to get software to misbehave than change unique identifiers?
Perhaps the person responsible can tell us how we know what route in the previous data the new times refer to. Maybe they can also tell us how to site route schedules at the same URL between schedule updates and preserve bookmarks and not corrupt search engine indexes?
Wouldn't it be nice.
The biggest loser is the public whose software stops working. Next is the publishers who waste their time and money on data sets that don't enjoy greater use. Next is data consumers whose software stops working for no fault of their own and who spend considerable time with patchwork to minimize the damage.
I have had this discussion many times, and it turns out there is software out there--produced by I won't mention any names--that appends a suffix to the route_id with each schedule update. I have also heard there is software out there that makes that optional, though I can't imagine why someone would offer such an option.
Perhaps a simple fix is to deselect the option to append a suffix to the route_id. If the option isn't there, request an update from the software manufacturer. I think they know what they are suppose to do. We all do or think we do. But even when we think we do, we also have to be convinced it makes a difference when we don't.
One solution might be feedback to software manufacturers to stop marketing such low quality software. If you can't deselect the option to append a suffix to the route_id, request the feature. If the manufacturer doesn't respond with an option to keep unique identifiers unique, you really should buy different software.
My data consistently shows publishers who transition to minimum standards of good data design will see public use of their data double within a few weeks and rise continuously thereafter. Otherwise, public use spikes when they discover the convenience, then drops to nearly nothing when identifiers change and things stop working like they are suppose to.
On Wednesday, April 17, 2013 5:13:02 AM UTC-7, Klaus Nji wrote:
The current version of routes.txt has two entries for each route number. For route 96, we have an entry with route_id 96-140 and 96-145. Yet when I run some queries to return all stops being served by route 96-145 for the given service calendar, I get nothing. There could be a problem with the way I am querying for these stops, which is unlikely, or routes.txt is listing routes that are currently not being served by the accompanying service calendar.
GTFS indicates that route_id is a field that uniquely identifies a route. It does not explicitly state physical route but one could safely assume so as this is what other agencies are basing their data upon. For Ottawa, there is only one route 96, thus having two entries for route 96, albeit with different post fixes is incorrect. If you look at routes.txt for any other agency, say Toronto, there is a single entry for each route, which is correct. This also means that the content of routes.txt does not have to change each time a new schedule is published. With the way OC Transpo has routes.txt defined, however, there will always be a change in this file when a new data set is published.
So my questions are:
1. Why are there two listings for each physical route? Say 96-140 and 96_145?
2. Also, why not confirm to the standard and have route_id be listed as 96, instead of 96-140? Afterall, the head signs of the bus running on route 96 does not say 96-140. It does say 96/Hurdman or Kanata...
TIA.
Klaus