transfers.txt stop pairs must be unique

39 views
Skip to first unread message

Aaron Antrim

unread,
Jul 15, 2009, 2:28:13 PM7/15/09
to gtfs-c...@googlegroups.com
I received engineering feedback for a feed submitted to Google Transit
this week. This feedback indicated that stop pairs in transfers.txt
must be unique. In other words, the following transfers.txt would be
invalid:

from_stop_id,to_stop_id,transfer_type,min_transfer_time
STOPA,STOPB,0,
STOPA,STOPB,1,

Earlier, I had assumed that preferred and guaranteed timed transfers
could be set separately. However, it does not seem this is the case.
Setting preferred and guaranteed timed transfers wouldn't seem to be
very useful anyway.

I suggest that the GTFS should include an additional sentence to
specify this: "Only one transfer type can be specified for a stop pair."

Joe Hughes

unread,
Jul 21, 2009, 4:34:49 PM7/21/09
to gtfs-c...@googlegroups.com
This seems like a reasonable proposal, since it describes the behavior
of existing systems. Are there spec users that object to this
proposal?

Joe
Message has been deleted

Frank

unread,
Jul 22, 2009, 1:29:59 PM7/22/09
to Google Transit Feed Spec Changes
Seems reasonable. But one question, since the following is legal:

from_stop_id,to_stop_id,transfer_type,min_transfer_time
S6,S7,2,300
S7,S6,3,

...maybe slightly rewording Aaron's suggestion to, "Only one transfer
type can be specified for an ordered stop pair." (eg: hoping to
communicate that the order of the stop pair has its own uniqueness).

Aaron Antrim

unread,
Jul 22, 2009, 1:37:10 PM7/22/09
to gtfs-c...@googlegroups.com
The suggested change "…ordered stop pair" makes more sense.  Thanks.

Tom Brown

unread,
Nov 4, 2009, 3:30:56 PM11/4/09
to gtfs-c...@googlegroups.com
Any comments on this proposal before I request it to be added to the 
official spec? 

David Hodge

unread,
Nov 4, 2009, 3:36:03 PM11/4/09
to gtfs-c...@googlegroups.com
No real comment here. Makes sense.

David

Tom Brown

unread,
Dec 30, 2009, 12:35:00 PM12/30/09
to Google Transit Feed Spec Changes
While it is good to document current practice in this case maintaining
backwards compatibility with the constraint "Only one transfer type
can be specified for an ordered stop pair" will be difficult. For
example, if the proposals to add from_route_id and to_route_id are
accepted it should be okay to list the same ordered stop pair multiple
times with different route_id constraints and transfer types. I'm
extending this proposal to say:

Only one transfer type can be specified for an ordered stop pair. When
reading transfers.txt ignore rows that contain values for unknown
columns because they may limit the applicability of a transfer.

Joe Hughes

unread,
Feb 22, 2010, 1:43:03 PM2/22/10
to gtfs-c...@googlegroups.com

I'm not sure that it's a good idea to say "ignore rows for which
unknown columns have values", since it would seem to break
extendability. In general, you want to be able to add annotations for
existing rows, but this would make it possible to make your
transfers.txt unreadable to an old parser via any future extension,
because you'd get in a situation where you'd ignore all the rows.

Joe Hughes
Google

> You received this message because you are subscribed to the Google Groups "Google Transit Feed Spec Changes" group.
> To post to this group, send email to gtfs-c...@googlegroups.com.
> To unsubscribe from this group, send email to gtfs-changes...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/gtfs-changes?hl=en.
>
>
>

Tom Brown

unread,
Feb 22, 2010, 3:44:04 PM2/22/10
to gtfs-c...@googlegroups.com
On Mon, Feb 22, 2010 at 10:43, Joe Hughes <joe.hug...@gmail.com> wrote:
On Wed, Dec 30, 2009 at 9:35 AM, Tom Brown <tom.bro...@gmail.com> wrote:
> While it is good to document current practice in this case maintaining
> backwards compatibility with the constraint  "Only one transfer type
> can be specified for an ordered stop pair" will be difficult. For
> example, if the proposals to add from_route_id and to_route_id are
> accepted it should be okay to list the same ordered stop pair multiple
> times with different route_id constraints and transfer types.  I'm
> extending this proposal to say:
>
> Only one transfer type can be specified for an ordered stop pair. When
> reading transfers.txt ignore rows that contain values for unknown
> columns because they may limit the applicability of a transfer.

I'm not sure that it's a good idea to say "ignore rows for which
unknown columns have values", since it would seem to break
extendability.  In general, you want to be able to add annotations for
existing rows, but this would make it possible to make your
transfers.txt unreadable to an old parser via any future extension,
because you'd get in a situation where you'd ignore all the rows.

Future extensions which add transfers.txt columns can be partitioned into those that do and don't preserve stop pair uniqueness. I think the extensions that break stop pair uniqueness are more likely, for example from_route_id and to_route_id. The intention of "ignore rows for which unknown columns have values" is to allow these extensions to to have defined behavior with compliant parsers. I agree that it isn't great.

Consider
from_stop_id,to_stop_id,transfer_type,min_transfer_time
A,B,2,120
and an extension that allows a transfer to be declared as wheelchair_accessible=0 or 1. Data with the extension might look like
from_stop_id,to_stop_id,transfer_type,min_transfer_time,wheelchair_accessible
A,B,2,180,1
A,B,2,90,0
which would break parsers that require unique stop pairs. If we adopt unique stop pairs and ignore rows with unknown values then future data can maintain backwards compatibility with something like
from_stop_id,to_stop_id,transfer_type,min_transfer_time,wheelchair_accessible
A,B,2,120,
A,B,2,180,1
A,B,2,90,0

Of course "ignore rows with unknown values" makes it annoying to add extensions which preserve stop pair uniqueness but I haven't seen any.

Alternate actions:
+ do nothing; the spec does not require stop pair uniqueness but Google's parser does and behaviour when a transfer is duplicated is undefined.
+ add only rule that rows with duplicate from_stop_id,to_stop_id are an error. This makes it impossible define an extension adding an index column without breaking old parsers, adding a new table or some other trickery I haven't worked out ;-)
+ create a way to identify future columns as part of the index
+ if there are duplicate rows ignore all rows with unknown values
+ if there are duplicate rows take the one with the lowest count of unknown values
+ if there are duplicate rows take the first to appear in the file

I thought about writing this email as a list of possible versions of GTFS, as specified, in practice and in the future but decided this would be easier to follow. Did I lose everyone?

Arno Eigenwillig

unread,
Mar 29, 2010, 11:38:59 AM3/29/10
to gtfs-c...@googlegroups.com
Hi

Here is another thought on the issue (which we will probably meet
again in pathways.txt).

How about we give each row a chance to declare how much a
GTFS-consumer needs to understand of its key columns so that it can be
used safely?

One way to to this is to add a column min_required_key_column_names,
which contains a whitespace-separated list of column names. For each
row, the GTFS-consumer checks if it knows all the columns. If no, it
ignores this row. If yes, it processes this row. Among the processed
rows, each key value must be unique.

This is different from just looking at which columns are used in each row:

- There is no longer a problem with a new attribute column (like
wheelchair_accessible, which is not involved in deciding if a row
applies to a certain transfer): it's just never included in
min_required_key_column_names.

- Requiring less columns than used allows graceful fallback.
Imagine a hypothetical extension of transfers.txt by start_time and
end_time that declare a min_transfer_time record to be applicable only
during certain hours of the day, possibly to require longer transfer
durations during crowded rush hours. The usual min_transfer_time that
holds for most of the day can have min_required_key_column_names =
"from_stop_id to_stop_id" so that older consumers use just this row;
the other rows get min_required_key_column_names = "from_stop_id
to_stop_id start_time end_time" so that they are only processed by
consumers that can handle start and end times.

A disadvantage of this idea is its verbosity. Therefore I like the
following variant better: Have a column min_required_key_columns for
the *number* of key columns a GTFS-consumer must know before
processing a row. The consumer, written for GTFS as it was at one
time, must enumerate the columns specified by GTFS at that time and
count how many of them appear in the table header. If the consumer
finds less than min_required_key_columns, it does not process the row.

For making columns optional, this is less flexible than
min_required_key_column_names described above, because a feed using
new columns from two successive extensions of GTFS cannot make the
first set of new columns optional but the second set required. But
counting the number of columns present in the table is still more
flexible than a "version number", because it works in our setting of a
base spec with proposed extensions on top, each of which may be
incorporated into the base spec at an unpredictable time after feed
producers and consumers have started to use it.

What do you think?
Arno Eigenwillig, Google

--
Google Switzerland GmbH

David Turner

unread,
Apr 1, 2010, 8:26:47 PM4/1/10
to gtfs-c...@googlegroups.com
On Mon, 2010-03-29 at 17:38 +0200, Arno Eigenwillig wrote:
> Hi
>
> Here is another thought on the issue (which we will probably meet
> again in pathways.txt).
>
> How about we give each row a chance to declare how much a
> GTFS-consumer needs to understand of its key columns so that it can be
> used safely?

This seems likely to work well for feed consumers, but to be quite
confusing for feed producers. Not for any fundamental reason -- it's
just that the new column requires a lot of thought to use correctly.

I don't immediately see a better option other than telling feed
consumers that they had better keep up with the spec.

Tom Brown

unread,
Apr 3, 2010, 2:15:05 AM4/3/10
to gtfs-c...@googlegroups.com
On Thu, Apr 1, 2010 at 5:26 PM, David Turner <nov...@novalis.org> wrote:
I don't immediately see a better option other than telling feed
consumers that they had better keep up with the spec.
 
This isn't just about consumers that fell behind the latest version of the spec. Feed producers have no way to include data with an experimental key column in an existing feed.

How about the following, it isn't pretty: A key column either is one of the existing key columns or has a column name starting with key_. If a consumer parses a row with a value in an unknown key column the row is skipped.

min_required_key_column_names and min_required_key_columns are more flexible because you can declare the special columns for each row independently. key_ is less verbose than either of them. Unlike min_required_key_columns there is no problem with multiple extensions adding columns.

David Turner

unread,
Apr 5, 2010, 12:20:38 PM4/5/10
to gtfs-c...@googlegroups.com
On Fri, 2010-04-02 at 23:15 -0700, Tom Brown wrote:
> On Thu, Apr 1, 2010 at 5:26 PM, David Turner <nov...@novalis.org>
> wrote:
> I don't immediately see a better option other than telling
> feed
> consumers that they had better keep up with the spec.
>
> This isn't just about consumers that fell behind the latest version of
> the spec. Feed producers have no way to include data with an
> experimental key column in an existing feed.
>
> How about the following, it isn't pretty: A key column either is one
> of the existing key columns or has a column name starting with key_.
> If a consumer parses a row with a value in an unknown key column the
> row is skipped.

So, as an example, for transfers.txt, a new column might be
key_wheelchair_accesssible. Then a value of empty would indicate that
this row should be applied by consumers that don't understand
key_wheelchair_accessible, while any other value would only be used by
feeds that do understand the column.

Is this correct?

Tom Brown

unread,
Apr 5, 2010, 12:55:37 PM4/5/10
to gtfs-c...@googlegroups.com
Yes, for consumers that don't understand key_wheelchair_accessible.
How consumers that do understand key_wheelchair_accessible interpret a row with an empty key_wheelchair_accessible value depends on the extension. The extension could specify that key_wheelchair_accessible must have a value (other than rows present for backwards compatibility) or the extension could specify that an empty value is equivalent to accessibility-unknown. You suggest the former, which is reasonable.


Arno Eigenwillig

unread,
Apr 14, 2010, 5:14:52 AM4/14/10
to gtfs-c...@googlegroups.com
Hi David!

On 2 April 2010 02:26, David Turner <nov...@novalis.org> wrote:
>> How about we give each row a chance to declare how much a
>> GTFS-consumer needs to understand of its key columns so that it can be
>> used safely?
>
> This seems likely to work well for feed consumers, but to be quite
> confusing for feed producers. Not for any fundamental reason -- it's
> just that the new column requires a lot of thought to use correctly.

Yes and no... It is probably tricky to write code for filling
min_required_key_columns with values that use its full power to
specify intelligent fallbacks. But the proposal also supports less
tricky uses:

- min_required_key_columns refers to the number of key columns in the
table that the consumer knows. A feed producer can just set it to the
same value in each row, namely the number of key columns it has put in
the table. That's easy to determine statically, but already more
flexible than fixed version numbers.

- The new column is optional, even in the presence of more key columns
than we have today. A feed producer can just omit it to do what you
proposed:

> telling feed consumers that
> they had better keep up with the spec.

Likewise, feed consumers can ignore the column if they know their feed
providers well enough.

In this way, we do not put a burden on producers/consumers who feel
they do not need it, but provides a common language for those who do.

Arno Eigenwillig

unread,
Apr 14, 2010, 5:14:57 AM4/14/10
to gtfs-c...@googlegroups.com
On 5 April 2010 18:55, Tom Brown <tom.bro...@gmail.com> wrote:
>> > How about the following, it isn't pretty: A key column either is one
>> > of the existing key columns or has a column name starting with key_.
>> > If a consumer parses a row with a value in an unknown key column the
>> > row is skipped.
>>
>> So, as an example, for transfers.txt, a new column might be
>> key_wheelchair_accesssible.  Then a value of empty would indicate that
>> this row should be applied by consumers that don't understand
>> key_wheelchair_accessible, while any other value would only be used by
>> feeds that do understand the column.
>>
>> Is this correct?
>
> Yes, for consumers that don't understand key_wheelchair_accessible.
> How consumers that do understand key_wheelchair_accessible interpret a row
> with an empty key_wheelchair_accessible value depends on the extension. The
> extension could specify that key_wheelchair_accessible must have a value
> (other than rows present for backwards compatibility) or the extension could
> specify that an empty value is equivalent to accessibility-unknown. You
> suggest the former, which is reasonable.

wheelchair_accessible is an easy case, because feed consumers ignorant
about it always want the "default"/"no wheelchair" case.

I don't see clear yet how this would work out for extensions that do
not allow such a simple fallback rule, and how it works out for
combinations of such extensions. My gutfeel says: the complexity that
becomes explicit in my attempt to define min_required_key_columns will
arise here as well, but through the backdoor of how the various
extensions interact with their policies on default and fallback
values.

Venjense

unread,
May 2, 2010, 1:41:52 PM5/2/10
to General Transit Feed Spec Changes
May I ask why you would include metadata about a transfer? Meaning, if
the stop location (station, platform, street) is ADA accessible
(Partial, Fully, Vehicle provides lift, etc) then the transfer would
be viable to that level of ADA compliance i.e. fully wheelchair
accessible, partially on a platform with ramps, or not at all.

If I needed to travel via a stop where the transfer exist only at that
stop location, I would have an option to look for ADA capable trips
based on data about the stop locations.

Granted, more things would have to be added to the stops.txt
specification but it seems as though the transfers.txt speaks more to
the point that if a route can make a transfer to another route.

/2cents

On Apr 14, 5:14 am, Arno Eigenwillig <arnoegw.c...@gmail.com> wrote:
> On 5 April 2010 18:55, Tom Brown <tom.brown.c...@gmail.com> wrote:
>
>
>
>
>
> >> > How about the following, it isn't pretty: A key column either is one
> >> > of the existing key columns or has a column name starting with key_.
> >> > If a consumer parses a row with a value in an unknown key column the
> >> > row is skipped.
>
> >> So, as an example, fortransfers.txt, a new column might be
> >> key_wheelchair_accesssible.  Then a value of empty would indicate that
> >> this row should be applied by consumers that don't understand
> >> key_wheelchair_accessible, while any other value would only be used by
> >> feeds that do understand the column.
>
> >> Is this correct?
>
> > Yes, for consumers that don't understand key_wheelchair_accessible.
> > How consumers that do understand key_wheelchair_accessible interpret a row
> > with an empty key_wheelchair_accessible value depends on the extension. The
> > extension could specify that key_wheelchair_accessible must have a value
> > (other than rows present for backwards compatibility) or the extension could
> > specify that an empty value is equivalent to accessibility-unknown. You
> > suggest the former, which is reasonable.
>
> wheelchair_accessible is an easy case, because feed consumers ignorant
> about it always want the "default"/"no wheelchair" case.
>
> I don't see clear yet how this would work out for extensions that do
> not allow such a simple fallback rule, and how it works out for
> combinations of such extensions.  My gutfeel says: the complexity that
> becomes explicit in my attempt to define min_required_key_columns will
> arise here as well, but through the backdoor of how the various
> extensions interact with their policies on default and fallback
> values.

--
You received this message because you are subscribed to the Google Groups "General Transit Feed Spec Changes" group.
Reply all
Reply to author
Forward
0 new messages