Motivation:
GTFS currently has an agency.txt file to provide information about the
agencies/operators that operate the services described by the feed.
However, the publisher of the feed is sometimes a different entity
than any of the operators (in the case of regional aggregators). In
addition, there are some fields in agency.txt that are really
feed-wide rather than agency-wide settings now that we allow multiple
agencies per feed. Finally, It would be useful to have an identifier
that a feed publisher can use to determine which version of their feed
is currently being used by a client.
Proposal:
A new optional feed_info.txt file will be added to the specification,
with the following fields.
feed_publisher_name (required):
The feed_publisher_name field contains the full name of the
organization that publishes the feed. (This may be the same as one of
the agency_name values in agency.txt.) GTFS-consuming applications
can display this name when giving attribution for a particular feed's
data.
feed_publisher_url (required):
The feed_publisher_url field contains the URL of the feed publishing
organization's website. (This may be the same as one of the
agency_url values in agency.txt.) The value must be a fully qualified
URL that includes http:// or https://, and any special characters in
the URL must be correctly escaped. See
http://www.w3.org/Addressing/URL/4_URI_Recommentations.html for a
description of how to create fully qualified URL values.
feed_timezone (required):
The feed_timezone field specifies the timezone in which the times in
the feed will be given. Any stop/stations which don't have a
stop_timezone specified are also assumed to be located in this
timezone. For feeds containing feed_info.txt, this value is used
instead of the agency_timezone values in agency.txt. Please refer to
http://en.wikipedia.org/wiki/List_of_tz_zones for a list of valid
values.
feed_lang (required):
The feed_lang field contains a two-letter ISO 639-1 code for the
default language used for the text in this feed. This setting helps
GTFS consumers choose capitalization rules and other language-specific
settings. Please refer to
http://www.loc.gov/standards/iso639-2/php/code_list.php for a list of
valid values. This value overrides any agency_lang values in
agency.txt
feed_version (optional):
The feed publisher can specify a string here that indicates which
version of their feed this is. GTFS-consuming applications can
display this value to help feed publishers determine whether the
latest version of their feed has been incorporated.
Comments?
Joe Hughes
Google
i love metadata but I would need to view it to confirm it and all of
those steps take time
Sadly the QA team does not like the inert hop approach and in
production it is (often) removed but if allowed & not distracting I
found I can mouseover the agency url & VOILA I see http://myagency.com#051410
and know this version was posted a few weeks ago
hoping simplicity might be a virtue
We have begun to use GTFS files as our standard for offering
information to our outside vendors who do work with us. Having a
version number (in whatever format) is important because we will have
to be on the same page when offering scheduling information to our
passengers. Although, none of these projecs are live yet so we can't
give an example.
Devin
San Diego MTS
That version number is then used in the following places:
1) In the filename on disk. ("google_transit.zip" files aren't particularly helpful in identification -- TriMet_20100529_20100529.zip)
2) In the database name once the dataset has been imported (ex: TQ-TriMet-20100529-20100529)
3) Once the dataset is generated into a schedule database for my application (TransitQ), the version number is embedded inside of the schedule database so that the application can tell when the schedule data has been updated by the user (this is necessary to update the user's favorite stop lists, etc)
4) The version number is also used in the filename of the schedule database that the end-user downloads (http://www.transitq.com/download). They use this to determine if there is a new schedule available or not.
5) The version number is used in the filename that's stored on the Palm (which is independent of the *.pdb filename and *.zip filename)
6) The version number is used to differentiate datasets on my GTFS browsing site
- Look how ugly it is to determine version on this page: http://gtfs.transitq.com/
- It's then embedded into the URL once you select a dataset (http://gtfs.transitq.com/TriMet_20100430_20100430/)
7) It looks like GTFS data exchange goes through a similar process. They generate filenames like lane-transit-district_20100528_0227.zip.
It seems senseless to have each GTFS consumer reinvent the wheel with their own version number scheme. We should have an ascending version number that (even better) fits some standard form (ex: YYYYMMDDHHMMSS) so that it could easily be placed into filenames, database names, web URLs, used for keeping archives, etc.
When it comes to version numbers, I think we should instead be asking ourselves why it makes sense to have a non-ascending version number. I can't think of any place I've seen a non-ascending version number that wasn't really annoying. Does the Windows version numbering scheme make sense? (Windows 3.0, 3.1, 95, 98, XP, me, Vista, 7)
Well that's my $.02 anyway.
- Max
This proposal looks ripe. Let me summarize its full status, and maybe we can then move forward and update the spec if there is consensus.
Motivation:
GTFS currently has an agency.txt file to provide information about the
agencies/operators that operate the services described by the feed.
However, the publisher of the feed is sometimes a different entity
than any of the operators (in the case of regional aggregators). In
addition, there are some fields in agency.txt that are really
feed-wide rather than agency-wide settings now that we allow multiple
agencies per feed. It would also be useful to include data about the feed
itself: examples include an identifier that a feed publisher can use to
determine which version of their feed is currently being used by a client,
and an explicit period of time during which the feed is providing accurate
and complete schedules (see detailed motivation in Arno's previous
reply on the thread)
Proposal:
A new optional feed_info.txt file will be added to the specification,
with the following fields.
feed_publisher_name (required):
The feed_publisher_name field contains the full name of the
organization that publishes the feed. (This may be the same as one of
the agency_name values in agency.txt.) GTFS-consuming applications
feed_valid_from (required)
feed_valid_until (required):
The feed provides complete and reliable schedule information for
service in the period from the beginning of the feed_valid_from day to
the end of the feed_valid_until day. Both days are given as dates in
YYYYDDMM format as for calendar.txt, or left empty if unavailable. The
feed_valid_until date must not precede the feed_valid_from date if
both are given. Feed providers are encouraged to give schedule data
outside this period to advise of likely future service, but feed
consumers should treat it mindful of its non-authoritative status.
feed_version (optional):
I am puzzled by :
feed_valid_from
(required)
feed_valid_until (required):
Yet the text describing these suggests that one or both can be missing ....
We already include a file of this nature with the data we supply to Google – but it is without these two elements. As a regional aggregator in the UK, there are no “standard dates” for the start or end dates of timetables ... so the best we could do would be to give a workaround – feed_valid_from would have to be the date on which the data file was exported from our databases whilst feed_valid_to is rather more a problem, as we cannot guarantee any date on which changes might happen. Although in theory an operator has to give 56 days notice of any change, in practice more than 50% of applications to change are made under “short notice” provisions meaning that less than 56 days notice is given. And whatever notice is given, there are the usual processes which take time before an exportable version of the data is available. We tend to give a look-ahead of 3 or 4 months on our journey planners ... but with the caveat to plan again nearer the time to check that nothing has changed in the meantime.
One further point – is there any indication that the content of this file is going to be used in Google Transit? Until we can be sure it is being used, we will still have to maintain legacy elements of data that we can be sure is published by Google – and that rather denies the benefits of the additional file. But we certainly support the existence of the file – and assume these points can be taken into account.
Roger Slevin
Traveline south east (and east midlands and east anglia) - UK
--