NextBus arrival predictions now available, including via API

Michael Smith

unread,

Jan 16, 2012, 4:06:20 PM1/16/12

to mtadevelop...@googlegroups.com

MTA developers,

As of this morning NextBus has made available real-time arrival predictions for the Staten Island system. This enables passengers to find out how many minutes away a bus is (e.g. 4 minutes) instead of how far it is away (e.g. 1.2 miles). Our system is based on GPS data provided by the BusTime project. Congrats to the NYC MTA on getting that fully working!

The NextBus Staten Island system uses the same software as for our other projects including Boston MBTA, Washington DC WMATA, Toronto TTC, San Francisco Muni, Los Angeles Metro, AC Transit, and many more. Therefore we were able to create the system in just a couple of days and yet be confident that it works well. But our predictions are based on historical travel times (as well as other data). This means that the predictions will become more accurate over time. But they certainly appear to already be accurate enough to be very useful to passengers. Also, there are some problems with the configuration in the current GTFS data where some routes have a few glitches in their patterns. But NYC MTA will fix these issues relatively soon as they are reported to them. When those trip patterns are fixed in the GTFS data NextBus will update our system with the new data.

You can see the NextBus system by going to nextbus.com. If you do this on a smartphone our web app will automatically show you predictions for the nearest stops. One can also get the real-time arrival predictions via our telephone system and via SMS.

But you are probably most interested in using the NextBus API to create even better apps! If you are so inclined you can access the documentation for the feed at http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf . And a nice bonus is that the feed currently works with a large number of other agencies as well. if you create an app for Staten Island then it will also be usable by the other transit agencies. See the current list of the 35 agencies on the NextBus API at http://webservices.nextbus.com/service/publicXMLFeed?command=agencyList

So develop away. You can drive a lot of users to your app by letting them know exactly when their bus is going to arrive.

And three cheers for open data!

--

Michael Smith · Chief Technology Officer · NextBus Inc.
Direct (510) 995-3207 · Mobile (415) 260-4700
msm...@nextbus.com

Webtech Wireless | InterFleet | NextBus | Quadrant | Telematics for the Planet ®

Frumin, Michael

unread,

Feb 2, 2012, 3:32:43 PM2/2/12

to mtadevelop...@googlegroups.com

Mike, belatedly,

Congrats on getting up and running so quickly, and thanks for being one of the major clients of the Bus Time API. With respect to predictions, I have a favor to ask.

But first a little context, much of which you personally won’t need. To be truly awesome, a bus arrival-time prediction system (such as yours) needs to know not just what route or trip a bus is serving, but what “block” (i.e. sequence of trips) it’s on. Without this, one can’t make predictions about things that will happen after a bus reaches the end of its current trip.

Currently, we are not supporting assignment of vehicles to blocks, or even to specific trips in the schedule. What we’re currently doing can be explained as, primarily, picking trips that seem to be a good probabilistic match for a given bus in a given location at a given time with a given destination sign. Thus the disclaimer on the developer API (http://bustime.mta.info/wiki/Developers/SIRIVehicleMonitoring):

At this moment, the MTA Bus Time system does not have a formal integration with the schedule. It cannot be stressed enough that GTFS trip ID's in SIRI VM responses are used only to indicate a particular stopping pattern, not that a given bus is actually serving a particular scheduled trip.

BUT, we do gather from the buses operator login data that will allow us in the future to match to specific blocks and trips from the schedule. This is one of the more immediate enhancements we are working on, and this is where we could use your help. Do you have the ability in your system to detect, for example, when a bus is on a block, on a trip, but then does *not* make the *next* trip on that block? If so, would you be willing to let us know what you’re seeing?

I assume you must track this because it would be a critical part of validating your predictions, since the block sequence factors heavily into predictions for the first few stops on a route. It would be tremendously helpful for us to know where/when we are getting it right, and when we’re not.

Thanks,

Mike

PS just a heads up – I am guessing that, despite the disclaimer, you’re assuming the block_id for a given bus based on the last 2 elements of the (randomly selected) trip_id. Eg for trip_id “MTA NYCT_20120108EE_090600_S76_0036_MISC_544” you are assuming that the block_id is “MISC_544.” This is close but not correct. Those 2 fields are the driver’s run_id, which describe the driver’s work over the course of a day (not the bus’s work).

On Staten Island most blocks are covered by a single driver’s run, but that is far from true generally in NYC (eg on the B63). In many cases in our schedules a block (i.e. bus) makes service for 18 or 20 hours, with drivers (and thus runs) switching at the end of a trip or in the middle of a trip. In this case, your current assumptions won’t allow you to predict from one trip to the next.

PPS one of our todo’s is to republish our GTFS with the block_id column populated in trips.txt

John L

unread,

Feb 2, 2012, 7:43:52 PM2/2/12

to mtadevelop...@googlegroups.com

Mike, is it my understanding that you plan on predicting, based where a bus is currently, how it will perform several journeys into the future?

What happens when a vehicle is replaced? Are you going to reliably, and I mean 100% correctly, send a SIRI update to subscribers that an equipment change has occurred with the correct vehicle number? That replaces all of the future journey's schedules with this vehicle number? What about cancellations, how many journeys are affected?

The intent should be to show where the vehicle is within its current journey. A rider will not care where a particular bus will be in 8 hours, nor a driver for that matter. They do care if the next several buses that make this stop will arrive in x minutes. Or if the rider has enough time to get a cup of joe before the next bus arrives. Or will the rider get to their destination in the time frame they want.

What you are describing is what operations needs to see or dispatching to call up an additional driver.

Developers, please bring this topic forward as I think what a customer needs is about current journeys that are occurring, not future ones that may occur.

As example, you would not see a subway countdown clock show a train that has not left a yard or turned at its destination and read "the seventh train from now is 75 minutes away"

Not diminishing what SIRI can do, just not sure if the right audience is being addressed.

However, Developers should way in.

Frumin, Michael

unread,

Feb 3, 2012, 9:02:55 AM2/3/12

to mtadevelop...@googlegroups.com

John,

What Department of Buses is doing with the information we’re collecting is, as you point out, a whole separate thread. But there are very important implications for the customer-facing elements. Consider the situation where you are waiting at the 2^nd stop on a bus route. Probabilistically, most of the time, the next bus to come to you will be a bus that is currently heading in the other direction on that route. It will, as every bus rider who lives near the beginning of the line knows, come to the terminal, wait for a bit, and then serve your direction.

We do currently show buses on their current journey (as you propose), but I disagree with your claim that that’s the bar we should set for ourselves. As Mike Smith can tell you, most bus customer info systems do “look around the corner” of the terminal (so to speak) to give information to customers at early stops on a route. As such, having some block-level association is critical in a bus network.

This is in the case even where no interesting operations are happening. What to do when the real world takes hold is a whole other story. i.e. the current state of things (including schedules) may indicate that a bus is going to make the next trip back, but when it gets to the terminal at the end of the route something happens. I’m sure Mike, and clients of NextBus’ and other APIs in other cities have lots of war stories on this subject.

Thanks,

Mike

Michael Smith

unread,

Feb 3, 2012, 2:03:05 PM2/3/12

to mtadevelop...@googlegroups.com

Mike,

You are correct in that to predict into the future a system needs to know what the vehicle is assigned to do. For NextBus systems we obtain this assignment data from a variety of sources including feeds of block, run, or trip info plus of course GPS data and configuration information. But these sources are all far from perfect. Plus one has to take into account the real-world operational issues of a transit system. When you work with a lot of agencies like NextBus does you really start to appreciate the complexity involved of running a transit agency.

As an example, when we first did the MTA Brooklyn system we found that the trip information was not adequately accurate, as was disclosed by NYC MTA. So we had to greatly deemphasize it when generating our predictions. With Staten Island we do use the run information as a hint of what the system will do. After all, it is only applicable information that is available from the feed (as far as we can tell). But we also use a great deal of software that was created for other transit agencies to determine what a bus really is going to do in the future. Some developers think that it is easy to generate predictions based on traffic conditions and such. Turns out that is a relatively small part of the problem (though still extremely important!). The most complicated part of our software is figuring out what a vehicle is supposed to be doing in the future in the first place. The end result is that the third-party developers don't need to worry about the complexities behind the real-time info. Instead, they can concentrate on creating their apps.

NextBus does have internal tools for determining when there are assignment problems. We spend a good deal of effort dealing with contractual obligations such as making sure are predictions are accurate. But given our many obligations we cannot help NYC MTA determine problems with the BusTime assignments.

Michael Smith · Chief Technology Officer · NextBus Inc.
Direct (510) 995-3207 · Mobile (415) 260-4700
msm...@nextbus.com

Webtech Wireless | InterFleet | NextBus | Quadrant | Telematics for the Planet ®

John Larsen

unread,

Feb 4, 2012, 8:54:46 AM2/4/12

to mtadeveloperresources

Michael,

I see an asterisk on nextBus in many of the transit operators with an
explanation of

"An asterisk indicates that the predicted vehicle has not yet departed
its terminal. The prediction is based on scheduled departure time as
well as other information. However, actual departure time may change
due to unforeseen circumstances."

Is this because you do not have access to the information or the
information is "stale" from the transit operator to accurately predict
a change in service and you fall back to the next best thing is what
should happen for a future journey?

I am just trying to understand the need and want here. What I see are
two scenarios (more yes but trying to keep it simple), a rider that is
about to (within 45 minutes, but usually in the next 15 minutes) board
a transit vehicle and a rider that is planning to use a transit
vehicle in the future (beyond 45 minutes and usually a few hours but
within the same day). You have to abandon your typical market segments
here because commuters and discretionary riders act the same way in
most cases, they will travel now or some time in the future.

Do you have any statistics regarding these two types of riders that
use your systems? My guess here is that you have alot of feedback for
those that fall into the first category but not so much in the second.
It would also be interesting to see the demographics that fall into
these categories as well to see if you are hitting most of the market
you are targeting or not. For instance, 80% of your discretionary
travel within 18 - 24 year olds that makes up 12% of your fare
collection or 30% of commuters in the 35 - 40 year olds and 40 - 45
year olds that make up 80 % of your fare collection.

Are you able to share these statistics if you have them?

Sunny

unread,

Feb 4, 2012, 9:42:39 PM2/4/12

to mtadeveloperresources

Maybe if the MTA could release the run cut, we could help reduce or
eliminate the task of trying to figure out what a bus will do next?

Frumin, Michael

unread,

Feb 5, 2012, 2:49:36 AM2/5/12

to mtadevelop...@googlegroups.com

Sunny,

On our medium-length list of improvement items is to back-fill and re-post the bus time-only GTFS, with the block ID's added into the trips.txt. For reasons not worth explaining, we are actually generating those block ID's as part of the bus time pre-processing step.

Once we've done that, you will have the information linking sets of trips together more robustly than by reading the run ID out of the trip ID's. That should give you what you need, since the API has trip ID's, albeit somewhat probabilistically assigned.

Since this is an aspect of bus time that is outside of the core pipeline, I think those folks working on the software might welcome some outside contributions to help get it moving faster. The whole back end is open source after all.

Thanks,
Mike

Michael Smith

unread,

Feb 6, 2012, 2:12:58 PM2/6/12

to mtadevelop...@googlegroups.com, nextbus-a...@googlegroups.com

Since this question relates to NextBus agencies I've also cc'd the nextbus-api-discuss list.

The reason we indicate for certain agencies when vehicles have not yet departed the terminal is because we have found it is useful to passengers to get more than just a simple prediction such as "5 minutes". For some agencies we have found that drivers do not always start their trips when they are supposed to. We might be tracking the vehicle and we know when the vehicle should leave but unfortunately sometimes drivers simply do not do what they are supposed to. Therefore the predictions for when a vehicle has not yet left the terminal are not as accurate as predictions that are based purely on GPS and historic travel times. The asterisk therefore provides the additional information to the passenger that the particular prediction is not as reliable, because it also depends on driver behavior.

Therefore it has nothing to do with stale information or a change in service.

It should also be noted that this is a very political issue. Basically we are saying that the sometimes the drivers are not doing what they are supposed to be doing. Therefore we have only enabled this feature on certain agencies. You would be surprised how political these issues can sometimes get.

Michael Smith · Chief Technology Officer · NextBus Inc.
Direct (510) 995-3207 · Mobile (415) 260-4700
msm...@nextbus.com

Webtech Wireless | InterFleet | NextBus | Quadrant | Telematics for the Planet ®

John L

unread,

Feb 6, 2012, 3:08:48 PM2/6/12

to mtadevelop...@googlegroups.com

I see. Thanks so much for your response.

Public transit fraught with politics? you don't say.....

Reply all

Reply to author

Forward