b) if/how you save that data?
We are working on a very similar project right now with flight data.
Cheers
Ryan
From: mtadevelop...@googlegroups.com [mailto:mtadevelop...@googlegroups.com] On Behalf Of Emilie Evans
Sent: Monday, April 04, 2011 6:19 PM
To: mtadevelop...@googlegroups.com
Cc: Eugenia Manwelyan
Subject: [MTAdev] B63 historic bus data
Hello,
We don't disagree that time-based prediction would be a good thing to have. The reason we started with distance is that, because it is a precursor to time-based prediction, it was necessarily faster to deliver to customers and to developers. Of course, any work that the community can provide demonstrating that accurate predictions are feasible would help us move in that direction.
As for your claim that "distance from stop is not really tangible to the average bus user" please allow me to respectfully disagree. Take for example some selected feedback we have received regarding the B63 pilot:
"The new mobile website is fabulous!... It has changed my life."
"Best thing in years from the MTA.... no more standing in the cold for an hour.... just pop it up and put my coat on when the bus is a few blocks away. More bus's soon please...."
"I take the B63 and I find Bus Time to be astonishingly wonderful"
"Your system has already saved me so much time"
"...made my family's lives much easier as a result . I find the info to be very accurate at all times"
I'm working on getting some logged B63 data into good shape for letting you guys play with it for developing prediction algorithms and other analyses. There's one snafu with how the data is logged (it was a pilot...) that I want to correct first, and it may take a little bit of time.
Thanks,
Mike
Hi Mike,
Thanks for writing back about the B63 logged data. We are actually not looking to formulate a time-based prediction algorithm, but rather comparing it with crowd sourced data for that same bus route. We hope to determine a threshold of accuracy for the crowd sourced data, as well as understand any discrepancies between the GPS logged station arrival times compared to scheduled times.
I understand that you're currently working on formatting the GPS-logged data, correct? Any idea when this data would be available for our use?
Thanks,
Eugenia Manwelyan
Masters in Urban Planning Candidate
Graduate School of Architecture, Planning and Preservation
Columbia University
Eugenia, thanks. The data prep we’re working on is for both needs. Comparison against crowdsourced data sounds very interesting.
Again, I must repeat what is discussed in the Bus Time API documentation – it is not possible to use this pilot data to compare the observed bus arrival times to the scheduled times. In this pilot, we are not formally capturing the schedule for a given bus, so even though it looks like we’ve given it a trip ID (with implied scheduled stopping times), that’s not necessarily the right trip. You may count a bus as 5 minutes late with our data when in truth it could have been 5 minutes early. This is something we are looking to remedy in the wider deployment.
So, conclusion: we’re working to get the data out for everyone’s purposes, but at this stage it can’t be used to analyze schedule adherence.
Thanks,
Mike
You mean data like this?
https://foursquare.com/venue/428520
<_Jym_>
If you are interested in time-based predictions you can already use the
NextBus API (http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf) for
the B63 route and also for a good number of other transit agencies. This
feed has been used by many third-parties to create useful applications
such as Android/iPhone apps and even a real-time data based trip
planner. The feed data has also been used to gather and analyze running
times, schedule adherence, etc. The feed has block assignment and other
information that can be useful for such work. NextBus also generates
actual schedule adherence information but that information is usually
only provided to the transit agencies instead of to the general public.
Mike
Everyone -- as I have mentioned now multiple times in this and other forums (including the Bus Time API documentation page), there are no accurate block assignments in our pilot MTA Bus Time system (nor in the GTFS data), so there can't possibly be any such assignments in any other feed which is layered on top of ours. Similarly, also as mentioned, because there's no formal integration with the schedule in our pilot system, no meaningful schedule adherence calculations can be made either (as compared to, say, headway adherence or running time analysis, which make sense regardless of the schedule).
Just a fair warning to everyone; we don't want people using data in ways that don't really make sense.
Thanks,
Mike