Including data licensing/terms of use in GTFS spec

216 views
Skip to first unread message

Melinda Morang

unread,
Mar 13, 2017, 7:48:36 PM3/13/17
to General Transit Feed Spec Changes
Hi all.

Currently, GTFS creators don't seem to have a standardized method of writing and distributing the terms of use or licensing for their GTFS datasets.  I've seen webpages linked from the download location, text files included in the GTFS .zip file, no information at all, etc.

For someone like me who wants to harvest GTFS from all over the world and use it in a variety of applications, this is problematic.  For anyone writing an app, doing an academic study, making a software demo, etc, you have to figure out individually for each dataset whether it's okay to use it for that purpose.  This becomes quickly unrealistic if you're using more than a handful of datasets.

How are other people handling this?

I know transit.land has attempted to display this information in a convenient way, but many (perhaps most?) of the agencies in their registry still have no information available at all.  And I can imagine it's taken a lot of effort on the part of the transit.land staff to get to this point already.

Would it be a valuable addition to the GTFS spec to add some simple licensing categories into the spec itself, maybe in feed_info.txt, or else in a special new file (licensing.txt or some such)?  Some simple categories of allowable uses could be contained as options in the file, and perhaps some named license types (like Creative Commons).  The agency could include a link to further terms and conditions.  The new GTFS Best Practices thing that was just released could highly encourage agencies to include their licensing information and provide some explanation of what the choices mean.

What does the GTFS community think?

Melinda Morang

Stefan de Konink

unread,
Mar 13, 2017, 8:01:34 PM3/13/17
to gtfs-c...@googlegroups.com
On Tuesday, March 14, 2017 12:48:36 AM CET, Melinda Morang wrote:
> What does the GTFS community think?

My problem is that in The Netherlands I can guarantee that my entire feed
is CC-0. But if I want to make a pretty feed for Luxembourg, including
shapes, I would have a CC-0 timetable + ODBl shapes.txt. Hence the level of
copyright may variate even within a GTFS feed.

--
Stefan

Tony Laidig

unread,
Mar 14, 2017, 11:15:14 AM3/14/17
to General Transit Feed Spec Changes
The cases of feeds that have multiple licenses are likely few, but important to specify.

In those cases, I see the following options in order of complexity:
1. The file contains one line with the "least permissive" license for the feed

2. When there multiple licenses and the usage terms of those licenses do not agree (the data equivalent of GPL2 vs GPL3 vs LGPL), the  file can specify operators and licenses for those operators

3. When the data for an operator is not under one license (e.g. shapes have been enhanced with OSM data), operators AND entities can be specified.

In the case of number 3,

agency_id,entities,license_shortname,license_url
a , all , PublicDomain , http://null.com
b , all , CC-BY , https://creativecommons.org/licenses/by/4.0/
b , shapes , OdBL , http://openstreetmap.org/license

Sean Barbeau

unread,
Mar 14, 2017, 11:27:27 AM3/14/17
to General Transit Feed Spec Changes
Melinda,
I personally think adding license info in the GTFS feed would be helpful, although it seems every time this comes up for discussion it gets derailed (pun intended).  One such past discussion:

As far as the state of the industry, here's what Aaron Antrim and I wrote in a recent TRB paper (http://bit.ly/TRB2017-GTFS):

Many agencies that have a “Terms of Use” agreement that application developers must agree to when using the agency’s data, such as TriMet (Portland, OR) (39), BART (SF Bay Area) (40), and Corona, CA (41). In TCRP Synthesis 115’s survey of 67 agencies, 29 (50.9%) of agencies reported that they require a license or agreement, with the top three included elements being:
  • Right to use the agency’s data
  • Nonguarantee of data availability, accuracy, or timeliness
  • Liability limitations for missing or incorrect data
Even though there will be cases like Stefan's that can't represent the entire feed under one license, I don't think this should prevent the community from defining a simple way to indicate license info in the feed itself in the case where the feed does fall under a single license.  Perhaps most importantly, this will get the agency thinking about what license they should choose when creating the GTFS data.

In terms of consolidating the industry on licenses, Transitland offers a "model license" here under "For data providers":

The USDOT's National Transit Map effort requires that participating agencies grant USDOT a Creative Commons Attribution 3.0 license:

Sean

[39] TriMet. Terms of Use. http://developer.trimet.org/terms_of_use.shtml. Accessed August 1, 2012, 2012.
[40] San Francisco Bay Area Rapid Transit District. Developer License Agreement. http://www.bart.gov/dev/schedules/license.htm. Accessed August 1, 2012, 2012.
[41] City of Corona. GTFS Data License Agreement. http://www.discovercorona.com/City-Departments/Public-Works/Transportation/GTFS.aspx. Accessed August 1, 2012, 2012.
[6] MapZen. TransitLand - An Open Project - For Data Providers. https://transit.land/an-open-project/. Accessed November 9, 2016, 2016.
[42] Anaheim Report Transportation. GTFS data for developers. http://rideart.org/gtfs/. Accessed November 15, 2016, 2016.

On Monday, March 13, 2017 at 8:01:34 PM UTC-4, Stefan de Konink wrote:

Melinda Morang

unread,
May 17, 2017, 2:22:16 PM5/17/17
to General Transit Feed Spec Changes
Bump!

I wanted to bring this topic up again, as it didn't gain much attention last time.  The thread Sean cited in his response was from 2009.  Another thread I found that discussed the possibility of including licensing info in feed_info.txt was of a similar vintage.

It's time for a real discussion of this problem.  Automated systems need to be able to determine easily which feeds they can and can't use for their applications based on simple and standardized licensing info!

~Melinda

Drew Dara-Abrams

unread,
May 17, 2017, 6:24:20 PM5/17/17
to gtfs-c...@googlegroups.com
Thanks for restarting this thread, Melinda.

Licensing is certainly a hairy problem! Let me summarize a few potential additions to the GTFS spec to capture license information. In increasing order of use and complexity:

1. add a free-form column called "license" to feed_info.txt

- When it's a common license, the producer would set the column value to a SPDX identifier.
- When the producer hosts their own license terms online, the column value would be set to a URL.

2. add a free-form column called "license" to feed_info.txt AND allow a LICENSE.txt file

- When it's a common license, the producer would set the "license" value to a SPDX identifier.
- When it's a custom license, the producer would include the full contents of their license as a LICENSE.txt file. They could optionally give the license a name by specifying it as the "license" value in feed_info.txt

3. one of the above, along with additional columns in feed_info.txt that provide a concise interpretation of what the license allows

- "license_create_derived_product" can be set to "yes", "no, or "unknown"
- "license_redistribute" can be set to "yes", "no", or "unknown"
- "license_use_without_attribution" can be set to "yes", "no", or "unknown"
- "license_attribution_text" can optionally be set to a string when "license_use_without_attribution" == "no". This is the text that consumers would be required to display in their apps.

These are all of the fields that Transitland allows and stores. This scheme, of course, can't capture all possible licenses, but it does capture most of the important variation. (It's based on our own lawyers' review of ~30 major transit agencies in North America and the various terms they attach to their GTFS feeds.) More information is at https://transit.land/an-open-project/

TransitFeeds.com is considering adopting the same scheme: https://github.com/TransitFeeds/TransitFeeds-Public/issues/231

Re multiple licenses per feed: Note that Transitland has assumed to date that an entire feed is under one license. While this is an important situation to capture, I suspect the majority of feeds have one clear license. Feeds that need to specify multiple licenses could list them out in a "license" column in feed_info.txt or concatenate their licenses' contents in a single LICENSE.txt. It may not be possible to parse automatically, but would still be useful.

Do any of the above 3 options appeal to you all?

Drew

--
Drew Dara-Abrams, Ph.D.
head of mobility products
Mapzen: an open-source mapping lab

--
You received this message because you are subscribed to the Google Groups "General Transit Feed Spec Changes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-changes+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-changes/4e3fb1c6-3801-4c29-a88c-b8ca73e0baf4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sean Barbeau

unread,
May 24, 2017, 9:41:33 AM5/24/17
to gtfs-c...@googlegroups.com
Thanks Drew for outlining this!

My preference would be 2+3.

For #3 - was there a specific reason for allowing "unknown" as a valid value?  vs. just leaving out the field (I assume these are optional)?

Sean

--
You received this message because you are subscribed to a topic in the Google Groups "General Transit Feed Spec Changes" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gtfs-changes/iDmt-vjs180/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gtfs-changes+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-changes/CAK9NtS0e_QZXXDXxA3QAtjfH_ZYCTzFzjAd-a7YrtRfq9_kcuA%40mail.gmail.com.

Stefan de Konink

unread,
May 24, 2017, 10:02:04 AM5/24/17
to Sean Barbeau, gtfs-c...@googlegroups.com
I have one concern. My biggest deviation from timetable data (CC-0) is the
origin of map data, or shapes.txt. For example, I could make extremely
detailed shapes.txt if I would ODBL our feeds feed, which I will always
decline.

A pre-integration workaround might be to export OpenLR instead of map data,
but given that many of my end users wouldn't mind to use OpenStreetMap or
other datasources at all, really defeats the purpose.

It shoulds strange to have a license per file, but I would at least like to
see a different license for shapes.txt.
Stefan
Reply all
Reply to author
Forward
0 new messages