[Pune] Crowdsourcing: Play match-the-following and help build Pune's bus route information data!

54 views
Skip to first unread message

Nikhil VJ

unread,
Jul 4, 2015, 1:20:59 AM7/4/15
to datameet
Hi Friends,

Some of you whom I've put in Bcc had signed up on the Datameet Pune
interest form. While Craig is working on how to organize meetups
(found a venue, planning dates etc), I'd like to share a collaborative
task with you.

Here's something that some volunteer groups (ImproveMyCity, Pravasi
Manch, Pimpri-Chinchwad Citizens Forum) are upto. IMC is building a
website for PMPML (Pune's bus system), and we're helping PMPML put the
bus routes info in a standardized common structure that can then be
used for properly informing the public. We've had many roadblocks
previously with this process and are now much wiser off it.

There's an easy but repetitive task that we need to hammer out : we
have a standardized list of bus stops in English with unique code and
lat-long, which was formed and freely shared by an org called ITDP.
And we have a separate longer list of bus stop names in Marathi which
we'd gotten from an earlier unorganized dataset. We need to fill in
the Marathi counterparts of the English stop names.

I've put this all on this google spreadsheet:
https://docs.google.com/spreadsheets/d/1ppFJeb7Dnj6-1yvniH2Q6exQFzK6fM0wsZNn4XZIvkI/edit?usp=sharing

There are just under 1800 stops to bilingual-ize this way.

So what you have to do:
Play match-the-following, or type some Marathi words in. Simple!
Please write back to me if you're interested in doing a set of rows
for half an hour or so today, and I will add you to the doc as an
editor.

Target completion time: Sunday 5th July EOD. But if you can give some
time today before 4pm that would be great, as we're meeting PMPML's
CMD at that time to discuss on all this. He's recently taken charge
and is very passionate about making all the information transparent
and accessible to the public.

--------------------------------------------

Why this is important :
If you see the routes sheets in the same google spreadsheet, we're
going to populate it with existing route info prepared by ITDP (a few
years old), and have PMPML staff edit and update the route information
using this core list of bus stops. When adding a new stop in a route,
it's got to be linked to other stop info like stopcode, lat-long.
Building up this way, with properly cross-linked information instead
of arbitrary entries, will then enable all the information management
features one needs from a modern public transport service.

Some background:
These exercises had been done in the past, but the technologies used,
the core data like this, etc wasn't properly shared or passed on, and
PMPML themselves weren't involved and so never had the standardized
stuff in their systems. A GTFS (that google maps uses) feed was
created, but that's basically not human-manageable.. the main datafile
is around 6 lakh lines long.

So now we want to do things differently. A very important component
here is OPEN data, and transparency at the input end. This data format
that we've commonly agreed upon is both human and machine readable,
and once set up, will be a lot easier to maintain. PMPML will have
constant direct access to the system and they will be able to edit the
data themselves as soon as there are changes from their end.

It would be great to have a full-on database-powered system.. I am
clueless on that but knew a few excel hacks so am doing it this way.
If YOU have the skills to drum something up then welcome aboard!

-------------------------------

Coming back, breaking it down to simple parts : The task for right now
is just to fill Marathi names for each bus stop.

https://docs.google.com/spreadsheets/d/1ppFJeb7Dnj6-1yvniH2Q6exQFzK6fM0wsZNn4XZIvkI/edit?usp=sharing

--
--
Cheers,
Nikhil
+91-966-583-1250
Pune, India
Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
http://nikhilsheth.blogspot.in

Devdatta Tengshe

unread,
Jul 4, 2015, 1:35:12 AM7/4/15
to data...@googlegroups.com

Hi Nikhil,

I'll strongly suggest that we stay on GTFS as a standard.

There are many applications to work with it, since it's an open standard. I'm in pune and i can help you with this.

Regards,
Dev

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

srinivas kodali

unread,
Jul 4, 2015, 9:57:50 AM7/4/15
to data...@googlegroups.com

Nikhil,

It's better to add the stops on to openstreetmap and add Marathi/Hindi names.

OpenStreetMap is not accepting any bulk uploads, there is no point of making the dataset if it has to be redone again on osm.

You can use iD to map the bus stops and provide the details along with the agency name. This is easy with a tool and also makes sense to contribute to osm.

Regards,
Srinivas Kodali

Nikhil VJ

unread,
Jul 4, 2015, 10:50:41 AM7/4/15
to data...@googlegroups.com
Hi Devdatta,

Would be great to have you on board!

Yes, we really need a GTFS-ouputting system here.. it doesn't exist
yet so there's nothing at present to stick to.

Here's the PMPML data in GTFS format, latest copy that's from 2009 or
2011 or something:
https://drive.google.com/open?id=0B8sY5vZfk1W5Z1dZa2dmTlY4aHc

The main data file in GTFS is around 6 lakh lines long for PMPML. That
too with all arbitrarily calculated trip times (to the last second :P)
that don't reflect reality. And for human purposes it's practically a
binary. It can't be updated; it needs to be over-written.

When you need to change a program, you need something that'll edit the
source code, not the binary. Unfortunately in PMPML's case we've only
inherited an old binary.

And that's exactly what we're trying to work on here : A system
that'll output GTFS. We need to bring PMPML's data upto a level of
standardisation that it becomes possible to create GTFS feed from it,
and yet it needs to remain linked to PMPML as an original,
self-sufficient structure that they can own, refer to, and edit any
time *by themselves*. What had happened so far was that the people
working on the GTFS completely de-linked it away, so PMPML themselves
no longer had any access to their own data. One huge factor in all
this was the translation from Marathi to English and deleting off of
all Marathi references. The original data was completely in Marathi.
The people maintaining the data are going to be doing so in Marathi.
So we had completely de-linked sets of data on the input and the
output side and that has been a major blocker. GTFS unfortunately
doesn't have a bilingual feature, so that's one place we're going to
have to go beyond it : any system that maintains the data must do so
in both languages.

And the past GTFS creation was done in a copyrighted way so there's no
way to get that system back now without coughing up another fortune.
That's not going to happen anytime soon as we're having a deliberate
underfunding of PMPML going on right now, with desperately needed
funds being diverted to 100 times more costly projects like metro.

If you know something that will work over the internet or without
needing installation, so that staff at PMPML can edit it, in Marathi
language, without needing to be engaged in a full-fledged contract
with an IT company, then please advise.

I've written notes on my analysis of GTFS format and how we could go
about this, here:
https://datameet.hackpad.com/Public-Transport-GTFS-Format-Analysis-soh7vdzRsW5


So if you can explore how to make a database of this and have a system
of creating, editing or dropping routes...
One core requirement here is what is referred to as "child forms" or
one-to-many relationship in forms. Basically, you fill in a stop on
the route. Then you click on an "Add More" button to add another stop.
Then you can drag the stops up and down to re-order them. So we need a
web-form system that enables that : Edit a route >> you get a listing
of the stops. You can add new stops or drop existing ones. Entry
should be in search-as-you-type mode.. and with google spreadsheet's
data validation I've got that feature. And the core data of course is
the stops; they'll be like the inventory / stocks list, and the routes
will simply be like different purchase orders or similar listings of
inventory items.

While I've seen this kind of feature at many places, I haven't been
able to find a way of using it here (been searching a lot!), and I'm
not a from-scratch PHP/SQL/.NET etc programmer. I know Javascript and
can work with XML and JSON, and with the excel format I've been able
to use formulas that render all the data in nested XML/JSON.

That's actually how the present pmpml.org site's system is working.. I
churned out a nested XML and the website team used it. Give the
website (www.pmpml.org) a try.. search for "Pune Station", and you'll
know why we need bus stop rationalisation ;) My role at present is in
helping to convert the data that PMPML has to a form that can be read
by a program AND be edited by a human (techies often miss the latter
part!). We're going with whatever's best suited, the administration is
co-operating fully, so hop on!

(sorry if this reply doesn't seem well-structured.. )

--
Cheers,
Nikhil
+91-966-583-1250
Pune, India
Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
http://nikhilsheth.blogspot.in



--

Nikhil VJ

unread,
Jul 4, 2015, 11:38:40 AM7/4/15
to data...@googlegroups.com
(Saw Srinivas's email after replying to Devdatta's)

Hi Srinivas,

That would be great. But the list of stops we're using was created by
ITDP a couple of years ago. Some stops have shifted, plus when I
loaded it on to geojson.io, most of the locations seem to be off to
the east by 50-odd meters (instrument error probably). So the data
will need location updating. What's valuable is the rationalization of
bus stops with unique ID assigned to each, and the ITDP folks did a
fair amount of ground-truthing and filtered out many defunct stops,
added new ones, etc. That's why we're going with it. And even still,
there will be updates to do and I'm not sure if we can successfully
train the staff at PMPML to update on OSM regularly or to pull the
listings from there and update their data with it. It's good that the
lat-longs are there, but they're not critical at this point.

The system that evolves would decide whether linking with OSM is
doable or not. At present, what we have is a google spreadsheet.. can
you give a cell formula that retrieves the stops from OSM and lines
them up as search-as-you-type dropdown options when one is adding a
stop to a route? Or a script that pulls the data from OSM and formats
it in CSV with columns like the stops data we have at present, so that
we can copy in?

PS: Do share more advice, but kindly keep it clear that there are no
real coders on board at present who will be able to do what you're
suggesting. For example, when one says "use OpenTripPlanner", one
ought to give the actual *thing* that will do it, making adjustments
for all the requirements shared, with proper usage instructions, and
not leave it to the side having the requirements to figure it all out.
Or someone else in the community needs to do it and share it. This
isn't stackexchange (Hey I should go post something there too!). It
would be wonderful if more persons could just take the stuff that has
been shared open-source and implement the solutions that they think
should be used. I'm available for clarification on understanding the
data and the requirements, and if something awesome can come out from
this network then I'll see to it that there is official recognition
from the authorities for the efforts made. I'm actually working out
plans for a design contest that PMPML could announce for this.

Nikhil VJ

unread,
Jul 18, 2015, 12:20:57 PM7/18/15
to data...@googlegroups.com
Hi Friends,

Update on bus stops data: 100% Marathi-fication complete!
The spreadsheet is now ready for updating route info. Will go meet the
PMPML folks whenever I can in next week and start. User will be able
to add new or edit routes by filling in either Marathi or English name
of stops, and the formulas will lookup stopcode, etc.

Meanwhile, the structure is now ready, for anyone interested, to
program GTFS creation, or any other way of extracting info, like
XML/JSON.

https://drive.google.com/open?id=1ppFJeb7Dnj6-1yvniH2Q6exQFzK6fM0wsZNn4XZIvkI
Reply all
Reply to author
Forward
0 new messages