So I decided to at least have a go at using the myttc feed with
libroutez today, to see what the memory usage was like (I did a bunch of
work yesterday and earlier today on making my library more memory
efficient). Anyway, I ran into some problems almost immediately when I
started to create a graph out of the information using libroutez's
creategraph tool (based on google's transit feed python library):
wlach@addiction:~/git/libroutez> creategraph.py ttc_20090115_0513.zip
~/src/geodata/geodata.osm ttc.routez
...
Invalid value 0.438062744141E2 in field stop_lat
Invalid value -0.795270690918E2 in field stop_lon
...
Why were the latitude/longitude pairs put in as fractions? Shouldn't
they look like (e.g.) 43.80, -79.52?
P.S. Does anyone have complete raw openstreetmap data for the greater
area covered by the ttc? Kevin sent me some data in sql form, but I'd
prefer to just be able to use OSM instead of writing yet another parser
for geodata so libroutez can consume it.
--
William Lachance <wrl...@gmail.com>
I guess I could clean up the data on my end with a small script ut IMO
it would preferable to get this sorted out on your end, for the
benefit of other people consuming the data. Shouldn't fixing this just
be a matter of doing a bit of string manipulation/conversion in
whatever ruby script you use to generate the gtfs?
> What format were you looking for the osm data? CSV? I can produce a
> quick export if you need.
I'd prefer OSM data, as that's the kind that libroutez already knows
how to parse. I'm disinclined to write a custom handler for CSV data.
> If you want the original osm data you can use the method described
> http://wiki.openstreetmap.org/wiki/Getting_Data
Unfortunately OpenStreetMap places some pretty heavy restrictions on
how big a subset of data you can download using their web api these
days, and the prospect of parsing the gigantic planet.osm file is a
bit daunting.
If you don't have a copy of the original OSM data you used handy, I
guess I could just process the geobase dataset for Ontario. Hope that
doesn't overload my laptop's 4gig of RAM. :)
--
William Lachance
wrl...@gmail.com
wget --timeout=0
"http://www.informationfreeway.org/api/0.5/*[bbox=-80.105,43.417,-78.672,44.245]"
-O gta.osm
the result (as of Sept 08) can be found here:
http://media.myttc.ca/gta.osm.gz (2.5Mb / 31.6Mb)
I agree that exponential notation is crap - I think it's because of the
string length. Might be good to shop a few decimal places off the
Lat/Lng values anyway... would save a fair bit of space. I don't think
we need accuracy beyond 5 places if memory serves.
Hope that helps Will - can't wait to see your results!
Cheers,
Kieran
Awesome Kieran! That's exactly what I was looking for.
> I agree that exponential notation is crap - I think it's because of
> the
> string length. Might be good to shop a few decimal places off the
> Lat/Lng values anyway... would save a fair bit of space. I don't
> think
> we need accuracy beyond 5 places if memory serves.
Ok, I bit the bullet and massaged the data in emacs using some
ridiculous regular expressions:
0\.\([0-9][0-9]\)\([0-9][0-9][0-9]\).*E2,\-0
\.\([0-9][0-9]\)\([0-9][0-9][0-9]\).*E2 -> \1.\2,-\3.\4
as well as:
0\.\([0-9][0-9]\)\([0-9][0-9][0-9]\).*E2,\-0
\.\([0-9][0-9]\)\([0-9]\).*E2 -> \1.\2,-\3.\4
I keep on forgetting the nuances of emacs' RE syntax, so those are
somewhat less elegant than they could be. Anyway, google's transit feed
module now seems to be happily processing the data now and I should know
in half an hour or so what libroutez makes of your data. :)
--
William Lachance <wrl...@gmail.com>