removing PublicFeeds wiki page in favor of gtfs-data-exchange.com

16 views
Skip to first unread message

Jehiah Czebotar

unread,
Nov 21, 2009, 11:56:43 PM11/21/09
to gtfs-c...@googlegroups.com
Joe Hughes previously started a thread about the general topic of more
community ownership and participation in GTFS governance. One step of
that was removing the existing PublicFeeds index page which is no
longer 100% comprehensive of all public gtfs data available. (this is
also a desired step to help reduce confusion about what constitutes a
public feed, and where that list can be found)

I have made quite a bit of progress in updating gtfs-data-exchange.com
to better list indexes of all GTFS schedule data available on the web,
and to provide appropriate links to the source locations, and Licenses
(where applicable).

So, i think it's time to take that step, and I am looking for a
general thumbs up from this list to move that process forwards.

The steps for an agency to be listed would then be expanded to a)
email googletran...@googlegroups.com (same as before) or b)
upload their data to gtfs-data-exchange.com

old: http://code.google.com/p/googletransitdatafeed/wiki/PublicFeeds
new: http://www.gtfs-data-exchange.com/agencies

--
Jehiah

Tom Brown

unread,
Nov 23, 2009, 4:29:20 AM11/23/09
to gtfs-c...@googlegroups.com
I think a strength of the PublicFeeds page is that acknowledges only the agencies that have solved the technical, legal and bureaucratic obstacles to providing an official feed anybody can instantly download. The feeds you and other advocates have fetched through back doors, FOIA, etc are useful but I want to encourage official feeds. How about making a different page that only lists agencies with an Official Feed Source URL? The people and organizations publishing them deserve special acknowledgement for their efforts. Maybe some sort of bold text or icon in the list of all agencies would be even better than a separate page.

Jehiah Czebotar

unread,
Nov 23, 2009, 11:21:04 AM11/23/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 4:29 AM, Tom Brown <tom.bro...@gmail.com> wrote:
> On Sun, Nov 22, 2009 at 05:56, Jehiah Czebotar <jeh...@gmail.com> wrote:
>>
>> Joe Hughes previously started a thread about the general topic of more
>> community ownership and participation in GTFS governance.  One step of
>> that was removing the existing PublicFeeds index page which is no
>> longer 100% comprehensive of all public gtfs data available. (this is
>> also a desired step to help reduce confusion about what constitutes a
>> public feed, and where that list can be found)
>>
>> I have made quite a bit of progress in updating gtfs-data-exchange.com
>> to better list indexes of all GTFS schedule data available on the web,
>> and to provide appropriate links to the source locations, and Licenses
>> (where applicable).
>>
>> So, i think it's time to take that step, and I am looking for a
>> general thumbs up from this list to move that process forwards.
>>
>> The steps for an agency to be listed would then be expanded to a)
>> email googletran...@googlegroups.com (same as before) or b)
>> upload their data to gtfs-data-exchange.com
>>
>> old: http://code.google.com/p/googletransitdatafeed/wiki/PublicFeeds
>> new: http://www.gtfs-data-exchange.com/agencies
>>
>
> I think a strength of the PublicFeeds page is that acknowledges only the
> agencies that have solved the technical, legal and bureaucratic obstacles to
> providing an official feed anybody can instantly download.

I agree. I was remiss in not including a link to the page that shows
just that sort of information
http://www.gtfs-data-exchange.com/agencies/astable

> The feeds you and
> other advocates have fetched through back doors, FOIA, etc are useful but I
> want to encourage official feeds. How about making a different page that
> only lists agencies with an Official Feed Source URL? The people and
> organizations publishing them deserve special acknowledgement for their
> efforts. Maybe some sort of bold text or icon in the list of all agencies
> would be even better than a separate page.
>

Also i think it's good to differentiate between the two meanings of "official".

one is "official" in regards to the data aka "official data"; ie:
generated by an agency, or by a company on behalf of an agency. This
sort of "official data" includes both data published publicly by an
agency, and those received directly from an agency via FOIA , email
requests, etc. The latter is still extremely useful for developers

the second is "official" in regards to distribution, aka "official
distribution". ie: an agency (or company on behalf of) that publishes
data on the web.

I've tried to clarify and distinguish between these two cases, and
developer generated feeds on gtfs-data-exchange by listing the source
url, and license where appropriate and also categorizing agency pages
as "Official GTFS Data" or "No Official GTFS Data Source Available"

--
Jehiah

Martin Akerman

unread,
Nov 23, 2009, 11:28:23 AM11/23/09
to gtfs-c...@googlegroups.com
I think an "official feed" link site fits the larger GTFS model better.
Larger agencies will start to create feed packages on the fly
(on-demand) and there should only be 1 source for the latest version (
the agency).

Many not-yet-participating agency stakeholders are hesitant to adopt
GTFS as a means to reach their riders for a variety of reasons
(security, scrutiny about how they run their transit system, etc.)
There are a lot of people, most not in our message boards, that are
simply not ready to release their system's data to Google. This
distribution channel may add some complexity to the acceptance of
GTFS.

As Tom mentioned, there are a lot of technical, legal and
bureaucratic obstacles that we have to consider when making this
choice.

My vote: LINK SITE TO PARTICIPATING AGENCY FEEDS WITHOUT ACTUAL DATA PACKAGES +1

-Martin A.
CUTR
> --
>
> You received this message because you are subscribed to the Google Groups "Google Transit Feed Spec Changes" group.
> To post to this group, send email to gtfs-c...@googlegroups.com.
> To unsubscribe from this group, send email to gtfs-changes...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/gtfs-changes?hl=.
>
>
>

Jehiah Czebotar

unread,
Nov 23, 2009, 12:44:43 PM11/23/09
to gtfs-c...@googlegroups.com
thanks for the feedback; a few follow-up thoughts

On Mon, Nov 23, 2009 at 11:28 AM, Martin Akerman <make...@gmail.com> wrote:
> I think an "official feed" link site fits the larger GTFS model better.
> Larger agencies will start to create feed packages on the fly
> (on-demand) and there should only be 1 source for the latest version (
> the agency).

i'm not sure what you mean by 'on demand', or how an archive of past
gtfs files hurts that. Also this seems like a prediction which is hard
to discuss. Is there a specific agency/case you are talking about that
creates and publishes feeds more than once a day?

Also i think your point about needing 1 source goes the other way as
well. As a developer (from personal experience) it's impossible to
keep up-to-date by checking 10 different agency websites regularly,
let alone 50 or 100. gtfs-data-exchange is designed to be a single
go-to location without ever hiding or precluding using an agency's
website directly. (it's also very much designed to point to the agency
website, as it's hard to find where feeds are published from an
agency's home page)

>
> Many not-yet-participating agency stakeholders are hesitant to adopt
> GTFS as a means to reach their riders for a variety of reasons
> (security, scrutiny about how they run their transit system, etc.)
> There are a lot of people, most not in our message boards, that are
> simply not ready to release their system's data to Google. This
> distribution channel may add some complexity to the acceptance of
> GTFS.

Yes, there are things agencies need to work through to publish their
data, i don't think this affects that equation as gtfs-data-exchange
only has a part in the equation after they make the choice to release
data.

>
> As Tom mentioned, there are a lot of  technical, legal and
> bureaucratic obstacles that we have to consider when making this
> choice.

I don't think there are any technical, legal or bureaucratic obstacles
with indexing agencies that publish feeds, or with changing from
indexing on the wiki vs gtfs-data-exchange. Those sorts of issues are
within an agency with regards to their decision to publish schedules
or not.

>
> My vote: LINK SITE TO PARTICIPATING AGENCY FEEDS WITHOUT ACTUAL DATA PACKAGES +1
>
> -Martin A.
> CUTR


--
Jehiah

J. R. Westmoreland

unread,
Nov 23, 2009, 12:55:04 PM11/23/09
to gtfs-c...@googlegroups.com
I'll put in a couple of comments here for the record.
In the case that I'm designing to handle, it would be nice to be able to go
there and download each feed as part of a batch process. This would make the
logic more simple to handle. Also, even if running by hand you could
remember one site but not a bunch of them.
We will only run the process once per month, or something like that.

J. R.

--------------------
J. R. Westmoreland
Custom Computers & Consulting
E-mail: j...@jrw.org
Twitter: GeneralJR
Skype: j.r.westmoreland

Joe Hughes

unread,
Nov 23, 2009, 1:20:50 PM11/23/09
to gtfs-c...@googlegroups.com
Hi Jehiah,

Thanks for starting this conversation, and thanks for your work on the
gtfs-data-exchange.com website. It's really helpful for developers to
be able to ensure that they have the latest data from each agency.

First, could you clarify the intent of your proposal WRT the GTFS
document? The gtfs-data-exchange.com website is already mentioned in
the "Making a Transit Feed Publicly Available" section of the
document.

Second, could you clarify whether the gtfs-data-exchange.com site
offers a view that shows only data that has been officially released
to developers by the agency?

Third, is there a way for an agency to simply provide the URL of their
feed to your site, rather than uploading? Requiring a manual upload
every time the data changes seems like it would increase the amount of
work for an agency, increasing the chance that the data on
gtfs-data-exchange.com would be less fresh than what's on the agency's
own website.

Thanks,
Joe

Martin Akerman

unread,
Nov 23, 2009, 1:49:49 PM11/23/09
to gtfs-c...@googlegroups.com
When I say on-demand, I mean agencies will be publishing as many times
as they are queried. A simple php, perl, java or ruby script can
create the text files and zip them on-demand so that a package is
fresh and hot off the press every time google or anybody fetches the
files. I don't know of any agencies doing that at the moment but maybe
somebody else here does.

Just to illustrate, I created an export tool for a different purpose
in 2006 that shows a file generated on-demand.
http://floridatransitindicators.org/detail.php?chart=5a -> Use "XML
Data" and "Most Recent Excel Data" to see "XML-RPC" and "CSV
on-demand" in action.
It is only a little out of date because it has not been updated but I
think it illustrates what I'm getting at.

If what you are suggesting is a historical index, I'm for it. However,
I'd be sure to get permission from the agencies you index.
The indexing system you are speaking of would not hurt the publishing
of new GTFS files and I can see the value to having an archive for
historical information.

Metrolink is a perfect example of an agency that may not want to be
included in the index.
http://www.metrolinktrains.com/tripplanner/schedule_data.php
They have some rules before the package can be downloaded. They also
state "Keep your work up to date. Check this page frequently and note
when schedules are updated. Please don't distribute the raw files: We
want to avoid out-of-date versions of schedules and other information
being circulated".

I like the web site you put together.
I'd still like agencies to host their packages so to not interfere
with the evolution of distribution of transit data.
The future is most likely in some form of XML-RPC variation hosted at
the agency and not in large CSVs.

-Martin A.
CUTR

Jehiah Czebotar

unread,
Nov 23, 2009, 3:37:07 PM11/23/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 1:49 PM, Martin Akerman <make...@gmail.com> wrote:
> When I say on-demand, I mean agencies will be publishing as many times
> as they are queried. A simple php, perl, java or ruby script can
> create the text files and zip them on-demand so that a package is
> fresh and hot off the press every time google or anybody fetches the
> files. I don't know of any agencies doing that at the moment but maybe
> somebody else here does.
>

in that sense of on-demand, gtfs-data-exchange does md5 checks on the
zip files when retrieved to ensure uniqueness, but it will probably do
md5's on the contained files in the future. ie: you can't currently
re-upload the same file multiple times. (also as far as on-demand
publishing goes, it (to me) almost indicates an even stronger need for
a developer to know when the underlying data changes; do i refresh my
system every month, day, hour, minute? )

> Just to illustrate, I created an export tool for a different purpose
> in 2006 that shows a file generated on-demand.
> http://floridatransitindicators.org/detail.php?chart=5a -> Use "XML
> Data" and "Most Recent Excel Data" to see "XML-RPC" and "CSV
> on-demand" in action.
> It is only a little out of date because it has not been updated but I
> think it illustrates what I'm getting at.
>
> If what you are suggesting is a historical index, I'm for it. However,
> I'd be sure to get permission from the agencies you index.
> The indexing system you are speaking of would not hurt the publishing
> of new GTFS files and I can see the value to having an archive for
> historical information.

It's designed as both historical archival, and index of sources;
however, with regards to replacing the PublicFeeds page it's the index
of sources functionality that matters.

>
> Metrolink is a perfect example of an agency that may not want to be
> included in the index.
> http://www.metrolinktrains.com/tripplanner/schedule_data.php
> They have some rules before the package can be downloaded. They also
> state "Keep your work up to date. Check this page frequently and note
> when schedules are updated. Please don't distribute the raw files: We
> want to avoid out-of-date versions of schedules and other information
> being circulated".
>

gtfs-data-exchange does check for updated files daily, in keeping with
the goal of solving the out-of-date schedule problem, and the 'check
frequently' request. However, metrolink is contradictory on the terms
around usage of those files, as the license that page points to
clearly says:

"... hereby grants you (Licensee) non-exclusive, limited and revocable
rights to use, reproduce, and redistribute SCRRA Data (Data)..."

which, of course, is the whole point of publishing schedule data to
developers in the first place; so it can be re-distributed and get
into the hands of riders.

> I like the web site you put together.

thanks

> I'd still like agencies to host their packages so to not interfere
> with the evolution of distribution of transit data.

me too; I want agencies to be directly involved in publishing data,
and i don't want gtfs-data-exchange to be a required middleman.

> The future is most likely in some form of XML-RPC variation hosted at
> the agency and not in large CSVs.

i doubt many small agencies will ever move beyond static schedule
files, but yes some larger agencies will be moving towards more
interactive api endpoints, especially with regards to realtime data;
but thats a different topic.

Jehiah Czebotar

unread,
Nov 23, 2009, 3:58:28 PM11/23/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 1:20 PM, Joe Hughes <joe.hug...@gmail.com> wrote:
> Hi Jehiah,
>
> Thanks for starting this conversation, and thanks for your work on the
> gtfs-data-exchange.com website.  It's really helpful for developers to
> be able to ensure that they have the latest data from each agency.
>
> First, could you clarify the intent of your proposal WRT the GTFS
> document?  The gtfs-data-exchange.com website is already mentioned in
> the "Making a Transit Feed Publicly Available" section of the
> document.

yup; i could have been clearer

Currently the specification says:

> #Making a Transit Feed Publicly Available
> Many applications are compatible with data in the GTFS format.
>
> The simplest way to make a feed public is to host it on a web server and
> publish an announcement that makes it available for use.
>
> Here are a few ways that interested software developers learn about public feeds:
> * A [list of transit agencies who provide public feeds] is available on the GoogleTransitDataFeed project site.
> * The [GTFS Data Exchange website] allows developers to subscribe to announcements about new and updated feeds.

I'm specifically suggesting that the bulleted section be changed to

The [GTFS Data Exchange website] contains an [index of public feeds]
and allows developers to subscribe to feed updates.

and that the PublicFeeds page be updated to remove it's current
content, and point to http://gtfs-data-exchange.com/agencies/astable

(disclaimer: i'm not an english major; if anyone has better wording
suggestion, please speak up)

>
> Second, could you clarify whether the gtfs-data-exchange.com site
> offers a view that shows only data that has been officially released
> to developers by the agency?

gtfs-data-exchange doesn't offer a "official source" only view, but it
does clearly designate which listed agencies publish an official
source for gtfs data. If there is interest, I can add a filtered view
in addition to the current views (indexed by agency name, indexed by
agency location, indexed by last update, and complete list with links
to source & license)

>
> Third, is there a way for an agency to simply provide the URL of their
> feed to your site, rather than uploading?  Requiring a manual upload
> every time the data changes seems like it would increase the amount of
> work for an agency, increasing the chance that the data on
> gtfs-data-exchange.com would be less fresh than what's on the agency's
> own website.

Yes; agencies can still post an announcement with an url to their data
to the googletransitdatafeed group, or the transit-developers group
and I will added it to the list as has been happening for the past
year.

I will also add a spot on the gtfs-data-exchange website for an agency
to directly submit a link to their feed.

Tom Brown

unread,
Nov 23, 2009, 7:09:56 PM11/23/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 9:58 PM, Jehiah Czebotar <jeh...@gmail.com> wrote:
gtfs-data-exchange doesn't offer a "official source" only view, but it
does clearly designate which listed agencies publish an official
source for gtfs data. If there is interest, I can add a filtered view
in addition to the current views (indexed by agency name, indexed by
agency location, indexed by last update, and complete list with links
to source & license)


The green stars at http://www.gtfs-data-exchange.com/agencies/astable are a good touch. Could you add something like that on the other pages (or at least "All agencies" page linked from your homepage) that list agencies next to each other? Developers looking for useful data won't be very distracted by them and I think it will hard to give agencies that do publish an official feed too much praise. It is their accomplishment that make apps possible.

An "official sources" only view would be nice because it wouldn't have the distraction of agencies that aren't quite there yet. :-P

Joe Hughes

unread,
Nov 23, 2009, 7:11:34 PM11/23/09
to gtfs-c...@googlegroups.com
Jehiah,

Thanks for the clarification. Can you say more about how the
gtfs-data-exchange site is administered? Will the site always be a
noncommercial endeavor?

Also, I'd like to hear thoughts from agency folks on this proposal.

Joe

Jehiah Czebotar

unread,
Nov 23, 2009, 8:43:37 PM11/23/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 7:11 PM, Joe Hughes <joe.hug...@gmail.com> wrote:
> Jehiah,
>
> Thanks for the clarification.  Can you say more about how the
> gtfs-data-exchange site is administered?  Will the site always be a
> noncommercial endeavor?

Right now gtfs-data-exchange runs on top of AppEngine, and Amazon S3
with some background processing of uploads on a server I manage. So,
the costs of running it each month are cheaper than I spend on coffee
in a day, so i'll be able to support that indefinitely; I am certainly
never going to add ads or start charging for access to GTFS feeds =)

The site doesn't require much maintenance; it's roughly the same code
base that's run for the past year, with a few recent additions to
handle better indexing.

That said, if anyone would like admin rights to help administer/manage
the list of urls that are crawled, and enter location and license
information as new feeds are added I'd be happy to grant admin rights.
(also, if anyone has a code feature they want to contribute; i'm happy
to grant code access as well). Neither of those things have happened
yet because no one has asked; but i'd welcome any contributors.

>
> Also, I'd like to hear thoughts from agency folks on this proposal.
>
me too.

Jehiah Czebotar

unread,
Nov 24, 2009, 12:41:44 AM11/24/09
to gtfs-c...@googlegroups.com
On Mon, Nov 23, 2009 at 7:09 PM, Tom Brown <tom.bro...@gmail.com> wrote:
> On Mon, Nov 23, 2009 at 9:58 PM, Jehiah Czebotar <jeh...@gmail.com> wrote:
>>
>> gtfs-data-exchange doesn't offer a "official source" only view, but it
>> does clearly designate which listed agencies publish an official
>> source for gtfs data. If there is interest, I can add a filtered view
>> in addition to the current views (indexed by agency name, indexed by
>> agency location, indexed by last update, and complete list with links
>> to source & license)
>>
>
> The green stars at http://www.gtfs-data-exchange.com/agencies/astable are a
> good touch. Could you add something like that on the other pages (or at
> least "All agencies" page linked from your homepage) that list agencies next
> to each other? Developers looking for useful data won't be very distracted
> by them and I think it will hard to give agencies that do publish an
> official feed too much praise. It is their accomplishment that make apps
> possible.

I've carried the green stars through to the last few pages that didn't
have them yet to consistently designate when an official data source
is present.

> An "official sources" only view would be nice because it wouldn't have the
> distraction of agencies that aren't quite there yet. :-P

filter added to each index page

thanks for the feedback.

--
Jehiah

T Sobota

unread,
Nov 24, 2009, 3:54:56 PM11/24/09
to Google Transit Feed Spec Changes
I would echo Martin's observation as far as serving as an actual
storage site for the raw GTFS data (versus a link repository)

Our system is drafting terms of use whose intent would be to grant the
licensee the right to "use" the data - as well as combine/arrange it
(i.e. in a third party app) - but would prohibit any publication (or
re-posting) of the raw text files.
Martin points out where the wording of one system would seem to at one
point target this same intent, while Jehiah points to language
elsewhere that would possibly permit such.


Tim Sobota
Metro Transit (Madison, WI)

On Nov 23, 12:49 pm, Martin Akerman <maker...@gmail.com> wrote:
> If what you are suggesting is a historical index, I'm for it. However,
> I'd be sure to get permission from the agencies you index.
> The indexing system you are speaking of would not hurt the publishing
> of new GTFS files and I can see the value to having an archive for
> historical information.
>
> Metrolink is a perfect example of an agency that may not want to be
> included in the index.http://www.metrolinktrains.com/tripplanner/schedule_data.php
> They have some rules before the package can be downloaded. They also
> state "Keep your work up to date. Check this page frequently and note
> when schedules are updated. Please don't distribute the raw files: We
> want to avoid out-of-date versions of schedules and other information
> being circulated".
>
> -Martin A.
> CUTR
Reply all
Reply to author
Forward
0 new messages