Daniel,
Thank you for sharing your code. I agree, pushing out the whole
shebang helps with understanding the changes better
I've built your fork of the converter from the sources you posted and
let it run against my validation data sets and a couple of weeks old
published data from TfL. Here are my observations, findings and a few
recommendations:
(1) Your fork flows the content of TXC/<LineName>s into the GTFS/
route_long_name field of routes.txt. The original converter populates
route_short_name instead. I've posted an enhancement request in the
project's issues database to add a switch to the converter
configuration file that controls where the content of TXC/<LineName>
goes.
(2) Where stops do not possess stop coordinates, you've removed the
OpenRequired entries and now leave the stop coordinate fields empty.
OpenRequired was introduced in an effort to help pinpoint deficiencies
of the GTFS output. I believe either way is fine, as the GTFS feed
validator will not pass stops without coordinates in any case (at
least that's been the way in the past)
(3) The format of the NaPTAN stops.csv file that you've used seems to
be different from what I pull once a month and what other users have
(to my knowledge). Version 2, as issue 334 suggests. The difference is
in the stop column names, "AtcoCode" instead of "ATCOCode", or
"Latitude" instead of "Lat", for example. If you do not need the
NaPTAN stop helper function that creates stop names based on the
"usable stop name rules", I recommend you use the NaPTAN Stop Column
Pick feature that can be set up in the converter's configuration file.
This way, you can account for the differences. For you reference, the
original converter assumes the following stop file headline:
"ATCOCode","GridType","Easting","Northing","Lon","Lat","CommonName","Identifier","Direction","Street","Landmark","NatGazID","NatGazLocality","ParentLocality","GrandParentLocality","Town","Suburb","StopType","BusStopType","BusRegistrationStatus","RecordStatus","Notes","LocalityCentre","SMSNumber","LastChanged"
As is, your fork of the converter will not run when the above format
is presented to it, and may folks are still using it.
I will however look into adding a switch in the converter
configuration file to be able to accept the v2 format as well. This
way, v2 can be covered and we have backwards compatibility
(4) You've forked version 1.5 of the original converter. I've released
version 1.6 since. It includes a few changes to the handling of
calendar dates; I do not know whether that might yield improvements on
your end
(5) I let lose the converter fork on the TfL published timetable data,
two weeks old circa. The script, which spends most of the time
executing the converter, ran for 3h11min on my box, which is a 2008
dual Core Mac, 16bit Java mode. By comparison, the original converter
took 3h20min for the same input data set and that was while I was
using the box for some other stuff. This means I do not see the major
performance improvements that you suggest. I suspect you may have a
problem where the converter falls into an exception handler unnoticed,
suggesting much better performance but in reality skipping large junks
of data. This is pure speculation on my part though.
Conclusion:
On a functional level, your fork boils down to changes following
findings (1) and (3) above. I might uncover a few more things as I dig
into it. As in the past, isolated change requests will be introduced
into the main line of the converter. I think I'm not going to go for
the repackaging of the Java classes of the converter however. Your
take on it is nicer than the current structure, but does not yield
significant functional improvements. With that, don't get me wrong, it
is great to see others take a stab at the converter code and get their
hands dirty. Great effort, which I appreciate much in that it helps
polishing the converter.
JP
On Jun 15, 9:25 am, Daniel Thomas <
dr...@srcf.net> wrote:
> I have made some further significant changes to Transxchange2GoogleTransit
> in order to get it to parse the TNDS correctly. Among other things it now
> runs significantly more quickly - it takes significantly less time to
> produce the feed than to run the validator on the feed whereas before it
> took orders of magnitude more time.
> I have also moved quite a lot of code towards the standard java conventions
> on various things.
> Since there are quite a lot of changes and so filing an issue for each one
> and dealing correctly with dependencies between them would be quite a lot
> of overhead I have pushed all this to github:
https://github.com/drt24/googletransitdatafeedand will keep that up to
> date with any further changes I make. Of course it is all Apache 2 and I
> would love it if this could all get upstream.
>
> I have particular concerns about the following changes I have made in that
> I am not sure that they are correct:
https://github.com/drt24/googletransitdatafeed/commit/c5415065bc8c65f...