Urdacha Data Wrangler ~ Post #1

0 views
Skip to first unread message

Theo Armour

unread,
Mar 17, 2013, 6:35:19 PM3/17/13
to urdacha
Hi Everybody

Yesterday was my first day looking at the Challenge data in serious manner. Below the three asterisks you will see the text of my first write up

I plan to add this text along with illustrations and links to hAxis screens to a 'diary' that will be on http://jaanga.github.com/urdacha/

Over the next two weeks I plan to add entries on a daily basis that describe my findings.

I hope to have a preliminary 'diary' section up in a few days

In the meantime (and probably on an ongoing basis) I will send out the text portion of the entries a day or so after the actual event

Theo

***

After about 20 years working on this kind of software - apps that visualize large quantities of data using animated 3D display, I going to do something I have almost never done before. I'm going to eat my own dog food. I'm going to put my mouth where my money is. I'm going to practice what I preach. I am going to use the software that I've been working on.

For the next two weeks I will be using hAxis and the other tools I built to look at the data from the Urban Data Challenge. I hope to come up with a few interesting points of view arising from simply looking at the data.

This is from a concept called Exploratory Data Analysis (EDA) first described by John Tukey back in the 1950s. BTW, John Tukey is a very interesting man, he probably invented the word 'software' and did invent the word 'bits' as in bits and bytes. He was also the mentor of somebody you might well know f you have been involved with graphic design: Edward Tufte.

The idea of EDA is that the data that you see can stimulate your brain and help you come up with a new paradigm, a new thought, a fresh point of view.

The reason that I'm able to do EDA myself this time is quite interesting in itself as well. Up to now writing the software has been very difficult - especially for somebody with my limited capabilities. I have generally spend the whole time writing the software, building the user interface, fixing bugs and never actually getting a chance to use what I build. Today, with tools like JavaScript, WebGL, and Three.js building a real time 3 D application is far easier than it ever has been before.

So this moment is actually pretty scary. Maybe I won't find anything interesting at all. This would be quite disappointing after working so many years on these ideas. But I am optimistic. I hope to come up with at least one fresh, exciting thought that comes directly out of the data. I will be delighted if we could come up collectively with about 10 or so different view points. And who knows, we have come a couple of weeks and some very smart people. Conceptually we could rule the world.

So the first thing I plan to do is look for data points in strange places. Traditionally I would start by creating a map in 3D and thinking geographically. But here with exploratory data analysis I plan to let the map go, forget the map. As somebody said a while ago, just let the physics rule.

So I think I'm just going to start with something the opposite of geography, which is time and put three different time elements on the three axis and see what happens. My guess is that the data will be in some kind of diagonal line X for is the earliest time, Z which is the other horizontal axis it for the door close time and then the time when the bus leaves something maybe on the vertical or Y axis. Since the series progress is it in time order, the data points should for matches to the single point somewhere in 3 D space. So that's my prediction. let's see how it goes

Later

The results were very much as predicted. The data was a cluster of points rising diagonally from the origin up to the opposite corner - and taking the whole day to do so. There was at least one interesting point: that data did not cluster perfectly. It looks like during some periods of the day it takes longer to load the bus than other parts of the day. Think of a caterpillar inching its way up a branch - sometimes longer, sometimes shorter.

This seems obvious. What was interesting was that the time of day when these loading delays seems to occur may not be obvious. 

My guess would have been that the morning and evening rush hours would have the greatest delay.

Looking at the data visualization offers a different perspective. Buses seem to run through the early morning and the morning rush hours with the minimum loading times. 

The longer loading times during the week days start to occur at around 10:30, rise at 11:30, rise quite a bit more at 15:30. The peak seems to be between 16:30 and 20:30. By 22:00 the buses are running with very few boarding delays.

On the weekends the story seem similar. The differences being the delays continue longer into the evening on Saturday and on Sunday the delays are more evenly spread out.

Thus one can probably surmise that the delays are not caused by commuters but are mostly due due to non-commuters.

As of now I have no numbers to back up my assertions. But what I do have is probably more important: a new theory to test.

Further thoughts

Some of the things I see will be obvious to a transportation planner.  But if I see things only a transportation planner would normally see then what amazing things could a transportation planner familiar with the app be able to see?

Somehow I have to show this to you.  In the next day or so I will start to prepare the links that will build the URLs that will reproduce what I was seeing. Hold on tight... 



Reply all
Reply to author
Forward
0 new messages