Ambiguity in stop locations on routes

Stuart Heinrich

unread,

May 10, 2012, 1:32:09 PM5/10/12

to gtfs-c...@googlegroups.com

A GTFS trip may be associated with a shapeID that shows the actual path taken. A common need is to render a portion of this trip between two stops on the trip. To this end, one would ideally use the "shape_dist_traveled" field in stop_times.txt. However, because this field is optional, it is not always there. For example, the SF MUNI provides shapeID's for all of their trips, but does not provide "shape_dist_traveled" for any of the stops.

It seems, then, that in order to actually make use of the shapeIDs, the actual distance along the shape must be calculated from the latlong position of each stop. Given a shape path, and a point, one can search for the path segment that has the minimum orthogonal distance to the given point (extrapolating linearly beyond the beginning and end of the path a small amount, if necessary). In general, this works, but there is no guarantee that two consecutive stop locations will project to locations on the path of monotonically increasing distance. Exceptions to this are not uncommon, especially in areas of high path curvature.

The requirement of monotonicity is important, not just for rendering the path segment, but also for computing the physical road distance between two stops (as opposed to distance as the crow flies). Some routing algorithms may require knowledge of this distance, and if it is not monotonic, then it would appear that a future stop has negative distance along the route from a previous stop.

Does anyone have suggestions of how to best resolve this issue (that is, to identify the indices that each stop falls on a trip's shape path) under the current protocol?

It seems to me that this issue merits some tightening up of the protocol itself. If a trip has a shapeID, then it should be mandatory that stop_times include some way of knowing where on this shapeID the stops fall. Theoretically, this information is provided by shape_dist_traveled, but I do not think this is really the best way to specify the information anyway because it requires calculating geodesic distances, and there is no one standard way for calculating such distances. One person might use a spherical approximation, one person might use a WGS84 approximation, and one person might use a Euclidean approximation. It would probably be asking too much to expect all agencies to measure distances using the same system (although, it would not be asking too much to at least specify a recommendation in the documentation). The result of different agencies using different approximations could result in stops being projected into different indices into the actual route path, which can result in the same problem already mentioned above. Therefore, I think that a cleaner approach would be to specify the index into the shape path that each stop should be inserted at.

Brian Ferris

unread,

May 11, 2012, 4:04:41 AM5/11/12

to gtfs-c...@googlegroups.com

You might look at the stop-to-shape matching code from OneBusAway:

https://github.com/OneBusAway/onebusaway-application-modules/blob/master/onebusaway-transit-data-federation-builder/src/main/java/org/onebusaway/transit_data_federation/bundle/tasks/transit_graph/DistanceAlongShapeLibrary.java

It's relatively robust in matching, even in cases where there are looped or complex shapes.

Brian

--
You received this message because you are subscribed to the Google Groups "General Transit Feed Spec Changes" group.
To view this discussion on the web visit https://groups.google.com/d/msg/gtfs-changes/-/KQktAIP1hzwJ.
To post to this group, send email to gtfs-c...@googlegroups.com.
To unsubscribe from this group, send email to gtfs-changes...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/gtfs-changes?hl=en.

Stuart Heinrich

unread,

May 11, 2012, 1:37:37 PM5/11/12

to gtfs-c...@googlegroups.com

Thank you Brian. It looks like, if I may summarize the logic behind that code as well as ShapePointsLibrary.java that it references, the basic algorithm used is:

1) For every stop location, compute a list of potential insertion indices into the shape list based on orthogonal projected distance.

2) Treat the problem as a global search problem, whereby one searches for a "path" through the possible assignment tree such that:

a) the distance along the path for every stop must monotonically increase relative to the previous stop (this defines a valid path), and

b) if there are multiple valid paths, then use the path where the sum of orthogonal distances is minimized.

There are a lot of exception cases identified in the code when the algorithm will simply admit that it cannot find a match (such as when the number of possible matches is too large for computational expense). In this case the feed is going to be deemed "invalid" by the algorithm, but this is not a universal standard for rejection that could be easily tested by feed producers.

Personally, I think that the complexity of this code and the above-mentioned issue of non-universal agreement on what is valid highlights the benefit of specifying indices into the path for stops directly rather than leaving it to implementation-dependent heuristics.

--
Stuart Heinrich, Ph.D
Lumatic, Inc
(802) 922-0731

Brian Ferris

unread,

May 12, 2012, 5:08:59 AM5/12/12

to gtfs-c...@googlegroups.com

The spec already has a mechanism for exactly specifying the semantics of stop-to-shape matching, aka the distance along shape fields in shapes.txt and stop_times.txt. It seems like your main argument is that things would be simpler for feed consumers if those were required fields instead of optional?

I agree with that argument, but I'd also point back to one of the core principals of GTFS: we favor making things simpler for feed producers, at the expense of more complexity for feed consumers. Making it a requirement that every feed with shape data also specify distance along shape values put the burden on GTFS producers to solve the potentially tricky task (as the complexity of my code suggests) of doing stop-to-shape matching. Considering many agencies maintain their schedule and GIS data in separate systems, or that some agencies are simply tracing route shapes by hand in Google Earth, I think the burden would be too high to agencies if we made distance along shape values are required part of the spec.

By moving the burden to GTFS consumer, it's true that we have to write some potentially complex code to do stop-to-shape matching, but there are fewer of us and it's probably a reasonable generalization that we seem to enjoy the task of writing potentially complex code ; ) so it really doesn't bother me too much.

Brian

Bradley Tollison

unread,

May 13, 2012, 4:39:25 AM5/13/12

to gtfs-c...@googlegroups.com

I agree with Brian, my agency isn't massive and it'd be a huge burden to include & maintain that data. Neighboring agencies seem to struggle with GTFS in general too so I think avoiding complexity is key to keeping and bringing on new agencies who aren't large.

John L

unread,

May 14, 2012, 8:03:05 AM5/14/12

to gtfs-c...@googlegroups.com

KISS

We all know it's not about Valentines Day.

If it is kept optional, the design can accomodate it if it is present or not. If it is mandatory, feeds will stop being produced.

I for one do not code for, meaning actively seek out and produce code, for corner cases at work. There are many efforts driven by budgets that are tied to operating the transit property that have priority, namely safety an efficiency. Customer Service is up and coming but the first step is AVIS and fare payment and collection systems. This tends to focus on the property itself rather than a regional focus, but that is changing as well. Designs are always made with GTFS and SIRI in mind to generate feeds at the NY MTA.

I would be hard pressed to spend tax payer money on projects to satisfy only a few feeds that the property gains no direct measure of success from.

If the FRA, FTA, FAA, DOT, etc. came along and said here is your new standard, funding would not be an issue. Frankly, this is where I ultimately see this going, but not in my careers lifetime. If the US DOT adopted a standard like GTFS, it would more than likely go the route of NaPTAN.

At Metro-North Railroad, as our GIS is becoming available, we hope to incorporate it into the shape files. This will probably not happen until late 2013 as our resources are focused on those other areas of safety, efficiency and customer service.

Stuart Heinrich

unread,

May 14, 2012, 3:05:42 PM5/14/12

to gtfs-c...@googlegroups.com

Thank you all for your input on the subject. I think you are right, it's better to keep it simple for feed producers -- the field should not be made mandatory.

Reply all

Reply to author

Forward