B63 historic bus data

Emilie Evans

unread,

Apr 4, 2011, 6:18:57 PM4/4/11

to mtadevelop...@googlegroups.com, Eugenia Manwelyan

Hello,

My name is Emilie, I am a masters student in urban planning student at Columbia University. I understand that the B63 bus is now equiped with a GPS and I'm interested in looking at some of the historic bustimes data with regard to how the scheduled times compare with the GPS times. I was hoping to do this for our final GIS project this semester, with my classmate Eugenia Manwelyan (cc-ed).

I was wondering if:

a) there is data that we can have

b) if/how you save that data?

Please let me know if it would be possible to obtain this data.

Thanks!!

Emilie Evans
--
MS Historic Preservation, 2011
MS Urban Planning, 2011

Columbia University Graduate School of Architecture, Planning and Preservation

ece...@columbia.edu
202.413.3424

Frumin, Michael

unread,

Apr 4, 2011, 8:26:58 PM4/4/11

to mtadevelop...@googlegroups.com

Emilie,

Thanks for your query. We are currently looking into this, since we see a number of potential applications of historic data. We need to flesh something out in terms of the most sensible data/format to publish the data.

But just to be clear up front -- when you say "looking at some of the historic bustimes data with regard to how the scheduled times compare with the GPS times" are you talking about:
- running times -- the time it takes for the bus to travel between two points

OR

- schedule deviation -- the difference between a bus' arrival time at a given stop and the sheduled time at that stop

?

Personally, I think the former is much more interesting (especially once people have real-time data), but in any case, as we state in the API documentation, the latter is not possible in this pilot. It has only an "informal" integration with the schedule, so there is really nothing to say that the trip_id of a given bus is the trip that bus is actually on. Rather, that trip_id is simply a representative trip with the right stopping pattern.

(Note: this is something we expect to remedy as we move from pilot to deployment this year)

Thanks,
Mike

Michael Freedman-Schnapp

unread,

Apr 4, 2011, 10:49:24 PM4/4/11

to mtadevelop...@googlegroups.com

I would also be very interested in a dataset for a month or more of times that buses arrive at each stop in order to create a predictive algorithm. I think this is like former, except the main organizing element is a bus run (which is an existing data element), and the specific data points given are the stop ID and the first time point recorded that the bus has at the stop.

Does that make sense?

--------

Michael Freedman-Schnapp

m...@mikebot.com

Emilie Evans

unread,

Apr 5, 2011, 9:51:01 AM4/5/11

to mtadevelop...@googlegroups.com, mfr...@mtahq.org, Michael Freedman-Schnapp, Eugenia

Hi all,

Below is similar to what I'm looking for as well. Data showing the arrival/departure timestamp with each stop ID along the B63 route, something we could essentially map in GIS. For my and Eugenia's project, recognizing that for each bus each day this is a lot of data, would be interested in perhaps looking at three "times" per day over a month's time.

I think most importantly for us, we just need to get a sense of what's available and in what format it comes in so we can assess how to approach our project.

Thanks for any help you can offer.

Emilie

Ryan Finnesey

unread,

Apr 4, 2011, 10:09:24 PM4/4/11

to mtadevelop...@googlegroups.com, Eugenia Manwelyan

We are working on a very similar project right now with flight data.

Cheers

Ryan

From: mtadevelop...@googlegroups.com [mailto:mtadevelop...@googlegroups.com] On Behalf Of Emilie Evans
Sent: Monday, April 04, 2011 6:19 PM
To: mtadevelop...@googlegroups.com
Cc: Eugenia Manwelyan
Subject: [MTAdev] B63 historic bus data

Hello,

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited

Will R

unread,

Apr 7, 2011, 5:46:27 AM4/7/11

to mtadeveloperresources

I've spoken to Michael Frumin about this in the past.

A predictive algorithm based on historic data would be an interesting
project, and it seems that many developers may repeat such research
and produce their own systems to do such a thing. I really think
however, that this work should be done by the feed provider (MTA) and
included in the feed. Distance from stop is not really tangible to the
average bus user. Perhaps one of those doing the research could share
their results with the MTA, or perhaps the MTA could kick off a
similar such project? I think it would be incredibly useful for the
end user of the system.

On Apr 5, 3:09 am, "Ryan Finnesey"

> ece2...@columbia.edu
> 202.413.3424

Frumin, Michael

unread,

Apr 7, 2011, 10:48:55 AM4/7/11

to mtadeveloperresources

Will, thanks.

We don't disagree that time-based prediction would be a good thing to have. The reason we started with distance is that, because it is a precursor to time-based prediction, it was necessarily faster to deliver to customers and to developers. Of course, any work that the community can provide demonstrating that accurate predictions are feasible would help us move in that direction.

As for your claim that "distance from stop is not really tangible to the average bus user" please allow me to respectfully disagree. Take for example some selected feedback we have received regarding the B63 pilot:

"The new mobile website is fabulous!... It has changed my life."
"Best thing in years from the MTA.... no more standing in the cold for an hour.... just pop it up and put my coat on when the bus is a few blocks away. More bus's soon please...."
"I take the B63 and I find Bus Time to be astonishingly wonderful"
"Your system has already saved me so much time"
"...made my family's lives much easier as a result . I find the info to be very accurate at all times"

I'm working on getting some logged B63 data into good shape for letting you guys play with it for developing prediction algorithms and other analyses. There's one snafu with how the data is logged (it was a pilot...) that I want to correct first, and it may take a little bit of time.

Thanks,
Mike

Eugenia Manwelyan

unread,

Apr 7, 2011, 3:09:32 PM4/7/11

to mtadevelop...@googlegroups.com

Hi Mike,
Thanks for writing back about the B63 logged data. We are actually not looking to formulate a time-based prediction algorithm, but rather comparing it with crowd sourced data for that same bus route. We hope to determine a threshold of accuracy for the crowd sourced data, as well as understand any discrepancies between the GPS logged station arrival times compared to scheduled times.

I understand that you're currently working on formatting the GPS-logged data, correct? Any idea when this data would be available for our use?

Thanks,
Eugenia Manwelyan

Masters in Urban Planning Candidate

Graduate School of Architecture, Planning and Preservation

Columbia University

ece...@columbia.edu
202.413.3424

Frumin, Michael

unread,

Apr 7, 2011, 3:32:44 PM4/7/11

to mtadevelop...@googlegroups.com

Eugenia, thanks. The data prep we’re working on is for both needs. Comparison against crowdsourced data sounds very interesting.

Again, I must repeat what is discussed in the Bus Time API documentation – it is not possible to use this pilot data to compare the observed bus arrival times to the scheduled times. In this pilot, we are not formally capturing the schedule for a given bus, so even though it looks like we’ve given it a trip ID (with implied scheduled stopping times), that’s not necessarily the right trip. You may count a bus as 5 minutes late with our data when in truth it could have been 5 minutes early. This is something we are looking to remedy in the wider deployment.

So, conclusion: we’re working to get the data out for everyone’s purposes, but at this stage it can’t be used to analyze schedule adherence.

Thanks,

Mike

Jym Dyer

unread,

Apr 7, 2011, 3:32:00 PM4/7/11

to mtadevelop...@googlegroups.com

> We are actually not looking to formulate a time-based
> prediction algorithm, but rather comparing it with
> crowd sourced data for that same bus route. We hope
> to determine a threshold of accuracy for the crowd

> sourced data ....

You mean data like this?

https://foursquare.com/venue/428520

<_Jym_>

Michael Smith

unread,

Apr 7, 2011, 4:56:05 PM4/7/11

to mtadevelop...@googlegroups.com

Will,

If you are interested in time-based predictions you can already use the
NextBus API (http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf) for
the B63 route and also for a good number of other transit agencies. This
feed has been used by many third-parties to create useful applications
such as Android/iPhone apps and even a real-time data based trip
planner. The feed data has also been used to gather and analyze running
times, schedule adherence, etc. The feed has block assignment and other
information that can be useful for such work. NextBus also generates
actual schedule adherence information but that information is usually
only provided to the transit agencies instead of to the general public.

Mike

Frumin, Michael

unread,

Apr 8, 2011, 8:46:45 AM4/8/11

to mtadevelop...@googlegroups.com

Mike, thanks. We are thrilled to see your service making time-based predictions using our location data.

Everyone -- as I have mentioned now multiple times in this and other forums (including the Bus Time API documentation page), there are no accurate block assignments in our pilot MTA Bus Time system (nor in the GTFS data), so there can't possibly be any such assignments in any other feed which is layered on top of ours. Similarly, also as mentioned, because there's no formal integration with the schedule in our pilot system, no meaningful schedule adherence calculations can be made either (as compared to, say, headway adherence or running time analysis, which make sense regardless of the schedule).

Just a fair warning to everyone; we don't want people using data in ways that don't really make sense.

Thanks,
Mike

Will R

unread,

Apr 8, 2011, 7:51:46 AM4/8/11

to mtadeveloperresources

Hi Michael,

Don't get me wrong, I think the work you guys are doing is very
forward thinking. I've worked with TfL in London, and they have spent
huge amounts of money developing applications for mobile and web
rather than simply opening up their data.

I think distance from the stop is of course very useful. Watching the
position of the buses on the map is great, and no doubt the end users
are loving it (I've had similar feedback for Bus New York City). But
it would be very useful to have predicted arrival times as well.

Reply all

Reply to author

Forward