Michael Smith
·
Chief Technology Officer
·
NextBus Inc.
Direct (510) 995-3207 ·
Mobile (415) 260-4700
msm...@nextbus.com
Mike, belatedly,
Congrats on getting up and running so quickly, and thanks for being one of the major clients of the Bus Time API. With respect to predictions, I have a favor to ask.
But first a little context, much of which you personally won’t need. To be truly awesome, a bus arrival-time prediction system (such as yours) needs to know not just what route or trip a bus is serving, but what “block” (i.e. sequence of trips) it’s on. Without this, one can’t make predictions about things that will happen after a bus reaches the end of its current trip.
Currently, we are not supporting assignment of vehicles to blocks, or even to specific trips in the schedule. What we’re currently doing can be explained as, primarily, picking trips that seem to be a good probabilistic match for a given bus in a given location at a given time with a given destination sign. Thus the disclaimer on the developer API (http://bustime.mta.info/wiki/Developers/SIRIVehicleMonitoring):
At this moment, the MTA Bus Time system does not have a formal integration with the schedule. It cannot be stressed enough that GTFS trip ID's in SIRI VM responses are used only to indicate a particular stopping pattern, not that a given bus is actually serving a particular scheduled trip.
BUT, we do gather from the buses operator login data that will allow us in the future to match to specific blocks and trips from the schedule. This is one of the more immediate enhancements we are working on, and this is where we could use your help. Do you have the ability in your system to detect, for example, when a bus is on a block, on a trip, but then does *not* make the *next* trip on that block? If so, would you be willing to let us know what you’re seeing?
I assume you must track this because it would be a critical part of validating your predictions, since the block sequence factors heavily into predictions for the first few stops on a route. It would be tremendously helpful for us to know where/when we are getting it right, and when we’re not.
Thanks,
Mike
PS just a heads up – I am guessing that, despite the disclaimer, you’re assuming the block_id for a given bus based on the last 2 elements of the (randomly selected) trip_id. Eg for trip_id “MTA NYCT_20120108EE_090600_S76_0036_MISC_544” you are assuming that the block_id is “MISC_544.” This is close but not correct. Those 2 fields are the driver’s run_id, which describe the driver’s work over the course of a day (not the bus’s work).
On Staten Island most blocks are covered by a single driver’s run, but that is far from true generally in NYC (eg on the B63). In many cases in our schedules a block (i.e. bus) makes service for 18 or 20 hours, with drivers (and thus runs) switching at the end of a trip or in the middle of a trip. In this case, your current assumptions won’t allow you to predict from one trip to the next.
PPS one of our todo’s is to republish our GTFS with the block_id column populated in trips.txt
Mike, is it my understanding that you plan on predicting, based where a bus is currently, how it will perform several journeys into the future?
What happens when a vehicle is replaced? Are you going to reliably, and I mean 100% correctly, send a SIRI update to subscribers that an equipment change has occurred with the correct vehicle number? That replaces all of the future journey's schedules with this vehicle number? What about cancellations, how many journeys are affected?
The intent should be to show where the vehicle is within its current journey. A rider will not care where a particular bus will be in 8 hours, nor a driver for that matter. They do care if the next several buses that make this stop will arrive in x minutes. Or if the rider has enough time to get a cup of joe before the next bus arrives. Or will the rider get to their destination in the time frame they want.
What you are describing is what operations needs to see or dispatching to call up an additional driver.
Developers, please bring this topic forward as I think what a customer needs is about current journeys that are occurring, not future ones that may occur.
As example, you would not see a subway countdown clock show a train that has not left a yard or turned at its destination and read "the seventh train from now is 75 minutes away"
Not diminishing what SIRI can do, just not sure if the right audience is being addressed.
However, Developers should way in.
John,
What Department of Buses is doing with the information we’re collecting is, as you point out, a whole separate thread. But there are very important implications for the customer-facing elements. Consider the situation where you are waiting at the 2nd stop on a bus route. Probabilistically, most of the time, the next bus to come to you will be a bus that is currently heading in the other direction on that route. It will, as every bus rider who lives near the beginning of the line knows, come to the terminal, wait for a bit, and then serve your direction.
We do currently show buses on their current journey (as you propose), but I disagree with your claim that that’s the bar we should set for ourselves. As Mike Smith can tell you, most bus customer info systems do “look around the corner” of the terminal (so to speak) to give information to customers at early stops on a route. As such, having some block-level association is critical in a bus network.
This is in the case even where no interesting operations are happening. What to do when the real world takes hold is a whole other story. i.e. the current state of things (including schedules) may indicate that a bus is going to make the next trip back, but when it gets to the terminal at the end of the route something happens. I’m sure Mike, and clients of NextBus’ and other APIs in other cities have lots of war stories on this subject.
Thanks,
Mike
Michael Smith
·
Chief Technology Officer
·
NextBus Inc.
Direct (510) 995-3207 ·
Mobile (415) 260-4700
msm...@nextbus.com
On our medium-length list of improvement items is to back-fill and re-post the bus time-only GTFS, with the block ID's added into the trips.txt. For reasons not worth explaining, we are actually generating those block ID's as part of the bus time pre-processing step.
Once we've done that, you will have the information linking sets of trips together more robustly than by reading the run ID out of the trip ID's. That should give you what you need, since the API has trip ID's, albeit somewhat probabilistically assigned.
Since this is an aspect of bus time that is outside of the core pipeline, I think those folks working on the software might welcome some outside contributions to help get it moving faster. The whole back end is open source after all.
Thanks,
Mike
Michael Smith
·
Chief Technology Officer
·
NextBus Inc.
Direct (510) 995-3207 ·
Mobile (415) 260-4700
msm...@nextbus.com