SOPHIE LAMPARTER | Head of Public Programs | t: (415) 912 5901 x108 | swissnex San Francisco |
Hi Theo
Thank you for your answer.
May we have a phone call today?
Because of our different time zones (9 hours) this would be possible from 8.30 PM (CET) = 11.30 AM your time. (I’m not available between 07.30 and 11.30 PM your time because of a personal reason)
Best Regards
Bruno
verkehrsbetriebe zürich - www.vbz.ch
bruno mändli, leiter informatik
luggwegstrasse 65, postfach 8048 zürich
tel. direkt +41 44 434 44 40, fax +41 44 434 47 84
ein unternehmen der stadt zürich
PPlease don't print this e-mail unless you really need to.
Von: t.ar...@gmail.com [mailto:t.ar...@gmail.com] Im Auftrag von Theo Armour
Gesendet: Samstag, 13. April 2013 02:17
An: Mändli Bruno (VBZ)
Cc: Sophie Lamparter; joh...@swissnexsf.org (joh...@swissnexSF.org); Conde Antonio (VBZ); Lutz Richard (VBZ)
Betreff: Re: [UrbanData] Comments on the VBZ Data Set
Hi Bruno
Thank you very much for your very detailed and speedy response.
I had been planning to send an add-on saying that replying in detail to my 'rant' would not be necessary - but this was delayed because of other matters. Sorry for this.
Here's why: The over-arching element in all this is that VBZ actually did publish data in an open manner, and people used it and VBZ obtained a variety of feedback. And, hopefully, feedback of all types that VBZ might not have received without running through this process. This is cool and the way of the future.
Some thoughts: I have ridden on the systems of all three cities in the challenge - and all have their amazing aspects. Furthermore I was #3 architect in charge of designing 12 MTRC stations in Hong Kong and, in yet another life, I was the program manager charged with designing three releases of AutoCAD at Autodesk. So dealing with reality, data and complexity seems to come naturally to me.
Therefore my 'gut' feeling is that if we continue this conversation it will soon become so interesting that we not be able to get anything else done.
;-)
So let us consider the discussion as not requiring any further effort.
Sorrows: I am sorry that you were not able to get the app to run. One issue - that - I should have noted in the 'readme' - is that we, in these preliminary efforts, only work and test with the latest version of the Google Chrome browser.
Also our recent focus is working on real-time data (such as the data from nextbus.com) and and building 3D apps so engineers can carry out Exploratory Data Analysis. Therefore our historical data in 2D skills need improving.
In any case, all the source code is in JavaScript on GitHub so any coder can quickly see the logic of what we were working on.
Conclusions: Bruno - again thank you for thinking so hard about all this. I agree with much of what you say - thought not all of it. But the main thing is for you to keep the VBZ buses and the data flowing and running smoothly...
Warm regards,
Theo
On Fri, Apr 12, 2013 at 6:28 AM, Mändli Bruno (VBZ) <Bruno....@vbz.ch> wrote:
Hi Theo
Thank you very much for your big work, you spent into this topic.
I think, there are many misunderstandings today. I try to give answers.
I really would like to propose having a phone call first, before doing any further action. My mobile phone number is +41 79 430 91 66. I would call you, if you would give me your number.
Regards, Bruno
My answers:
Data Anomalies
As I remember I said that data staging process is most important. So, we focussed on this process, to get a proper data baseline.
Collected operational data never will be perfect. We’re not doing financial transactions. You cannot add or correct missing data or implausible data during staging process. You have to do the best with the data you get.
We use a BI system, containing several data marts, to obtain fast access for analysis reason.
(On the other hand, we use a vendor propriety statistical application for detail analysis regarding operational and quality questions.)
Quality does not mean to add or “correct” data, but to put the right data sets together, to mark implausible data and to know some operational conditions. Missing operational data are absolutely not anomalies. Theo, if you run these kind of analysis all you guys did for the urban data challenge, what’s the difference concerning confidence interval of a statistical analysis if the result is based on 80%, 90% or 100% of the total amount of data?
I also said that data staging and plausibility are most important and most CPU consuming. (maybe I said that after the plenum discussion.) And this is really true. But this does not mean, that you’ll get 100% correct data. It means, that, after staging process, you have the possibility to use proper data to process analysis and reports.
On the other hand I said, that we are focusing on customer services as service quality, passenger information quality and transfer information quality. (We also run transfer protection, even between several public transport providers).
One thing I have been concerning from the begin of this challenge is the influence of dispatch actions. We run active dispatch actions as e.g. trip offset, reassignment of run or block, and many things more.
It would not have been possible for you to take all these things into consideration within this challenge. You can’t do such things within weeks. But these things affect our operational data, too.
This is an item we discussed in the jury team.
As I said on Saturday, we never have published some data so far. We had an ongoing discussion about this situation. My concerns were exact about what happened now. Just to deliver raw data would be easy, but what would the result be at the end, regarding an app? If you guys would not be aware about all the internals? So, with a realtime passenger information app, passenger could get wrong information.
And after reading your email, I’m convinced even more, not to publish row data only. Additional data should be prepared to show things from a passenger’s point of view. (As we do with our information- services) A simple example given: If there is a general delay on a route of 8 Minutes, and if the trip interval would be 8 Minutes, the situation would be perfect from a passenger’s point of view. The delay would just be a “technical delay”. Passengers never concern about our run/route or block numbers. They just expect a departure at a certain point and at a certain time. In this situations, almost everything would be perfect.
Unfortunately we can’t run your app. Just the “Test” works” And what I could see there using “dataset 1a” are pull-out trips from garage. Theo, we delivered all data we had, even all the unproductive segment parts.
Then, all data comes from the real operation. And you cannot expect a “five nine” quality coming from real operation. This would be fake, nothing else.
Bunches and Gaps
Unfortunately we could not run your application, so we cannot view your data.
I know the situation in Zürich, and I learned about the situation in SF, both cities just by riding public transport.
My proposal: Just try the service in reality and you will see.
The problem in normal situation is, that, if a vehicle get delayed, because of what reason ever, it tends to get more delayed because of the bigger passenger change and other factors. Especially in rush hour time periods.
So, what I said was, that we are in the comfortable situation to adjust vehicle capacity, e.g. by using the big 4 axes trolleybusses.
Additionally, we use our CAD/AVL system, that supports dispatch actions to handle such situations, if they occur. This is a comfortable situation, too.
Theo, what possibility do you have to manage these situation, from a given baseline, except
- justify vehicle capacity
- justify trip interval (it’s fix in Zürich in a given time period of a day)
- do an effective traffic light preemption (this works well in Zürich)
- have wide low floor doors to enable a fast passenger change, even for disabled persons.
- give public vehicles separate lanes.
- adjust schedule. (what may end in longer travel time)
By end of this yr, all Zürich busses will have low floor doors. And today, usually every second tramway trip has low floor doors. And on passenger information display, passenger may see, if next tram will have low floor doors or not.
We cannot really prevent other situations as vehicle breakdown, blocking of a route (because of any reason) or the influence of private traffic. Zürich does not have a Metro, traffic is all on the surface. And there is a big interaction of private and public traffic. So, I hope you compare e.g. tram routes by tram routes. Our system does also support dispatch actions, if such situations occur. (including reinforcement trips).
And, of course you are right, the smaller the planned trip intervals, the more do they trend to bunch. So, you also have to take this factor into consideration, if you compare situations. And Theo, do you really compare route by route?
Regarding passenger counting data: You’re right. Zürich equipped 20% of the vehicles with a passenger counting system. So far, there is no need to do more, because passenger load is calculated by statistical projection.
Further Data Set Issues
As I said several times, we never have published data so far. I agree, that data we delivered was probably not the best understandable way. As you can see, we delivered all data, even what we call “unproductive segments”. Theo, we did not get any questions or further requests about our data.
We delivered our data “as is”, coming from our BI system. You are right, there is redundancy. But, all things you mentioned are used in our system. There is 100% transparency.
As I mentioned before, operational data cannot be perfect, and there has to be a way to mark missing attributes.
Conclusion
I really think, we should not discuss about representation of Geo data. We use WSG84 format. It’s a common format for GPS receivers.
All other things: It would have been great, if you had contacted us earlier.
- Yes, only 20% of our vehicles are equipped with passenger counting system. By using statistical projection it is not possible, to break down missing passenger data to a certain trip and a certain stop.
- Within our BI system, staging of data is done, before data is filled into the different data marts.
- There is nothing to hide within our data marts. And, we do not add or change data, of course, within our staging process. (except to mark implausible data sets or set attributes as missing).
- Missing operational data sets or attributes, e.g. missing GPS data are not “anomalies”. E.G: Every 2nd radio datagram coming from vehicle does not have GPS data. Because we get datagram currently every 6 to 12 seconds, there is no problem for us at all. (with every 2nd datagram we send a “logical distance in meters” instead of the GPS data.)
- I’m sorry about the comma and semicolon issue.
Theo, at the end, we are focused, from a passengers perspective, on service quality, on passenger information (that also includes transfer information, as I told you) and on transfer protection. We spent much more time on these things then on statistical data. But, it would not be possible to reach our today’s quality for example on real time passenger information, if the data quality would be as bad as you describe it in your email. And we use our statistical data, of course, to improve our service, too.
Theo, another example: Another Urban data challenge team implemented reliability analysis. As you could see, VBZ is not as bad concerning reliability. But, if we are proceeding transfer protection with our system, and when vehicles for this reason have to wait for delayed other vehicles (also trains at the train station), schedule reliability will decrease. But, imagine, you stand on a bus stop at 11 pm, waiting for next bus coming in one hour, service reliability increases if we do so, even if schedule reliability decreases. Often these things are different if you change your focus.
With kind regards
Bruno
verkehrsbetriebe zürich - www.vbz.ch
bruno mändli, leiter informatik
luggwegstrasse 65, postfach 8048 zürich
tel. direkt +41 44 434 44 40, fax +41 44 434 47 84
ein unternehmen der stadt zürich
PPlease don't print this e-mail unless you really need to.
Von: Sophie Lamparter [mailto:sophie.l...@swissnexsanfrancisco.org]
Gesendet: Donnerstag, 11. April 2013 03:20
An: Theo Armour
Cc: joh...@swissnexsf.org (joh...@swissnexSF.org); Mändli Bruno (VBZ)
Betreff: Re: [UrbanData] Comments on the VBZ Data Set
Hi Theo,
Thank you so much for your insights! I wanted to thank you anyway for all the incrdeibly work you and your team did
on the data sets for your project and also helping other people in the group. You really had a great impact on creating and active, participatory Urban Data Challenge
community.
We discussed your project for a long time in the jury, but it did not make at the end for an exhibition proposal
because it was not easy understandable / readable for a general audience and also not very visual.
But I do think you should be in conversation with the public transportation departments from the three cities,
because you could really help them give them insights to their own data sets.
I CC'd here in the email Bruno Mändli, who is now probably back in CH so he can answer you directly to your
suggestions.
Thank a lot again & continue the good work and let me know if I can assist you in anyway.
Sophie
| SOPHIE LAMPARTER | Head of Public Programs | t: (415) 912 5901 x108 | swissnex |