GTFS stop_times.txt timepoint documentation

869 views
Skip to first unread message

Nathan Johnson

unread,
Jan 13, 2017, 2:41:25 PM1/13/17
to Transit Developers
The GTFS documentation for the timepoint field states that "it is an error to mark a[n] entry as a timepoint (value=1) without specifying arrival and departure times" and, also, that an empty timepoint value indicates a timepoint. If a missing timepoint column is interpreted as being equivalent to an empty timepoint value, the logical implication is that a GTFS feed without a timepoint column cannot have empty arrival/departure times. Doesn't this mean that the introduction of the timepoint field contradicted the previous GTFS documentation (i.e., empty times without a timepoint column was previously allowed, but is now considered an error)? In practice, many GTFS feeds continue to have empty times and no timepoint column - are they in error? Please let me know if I am misinterpreting something.

Thanks,
Nathan

Sean Barbeau

unread,
Jan 17, 2017, 8:37:25 AM1/17/17
to Transit Developers
Nathan,
Good question! The original GTFS spec had very clear language that said not to include interpolated times for stops that were not timepoints. Despite this, it had become somewhat of a standard practice for feeds to include those values, as it was considered useful to consumers. The timepoint field is a way to legitimize that data. So yes, technically feeds that have times included for all stops but don't include the timepoint field are not conforming other the spec. We are encouraging those producers to add the timepoint field to bring their feeds in alignment with the spec.

Also note there is currently a proposal to further clarify some related language:
https://github.com/google/transit/pull/33

Sean

Nathan Johnson

unread,
Jan 17, 2017, 9:43:49 AM1/17/17
to Transit Developers
Hi Sean,

Thanks for the response. Just to clarify, I'm inferring from the spec that feeds without a timepoint field that *don't* have times for all stops are non-conforming. I'm glad the spec is being clarified, although it could be more explicit that including missing times without a timepoint column is an error. Also, it is notable that the introduction of the timepoint field effectively broke backward compatibility and means that many GTFS feeds are now non-conforming (unless a missing timepoint column is interpreted as something other than an empty timepoint value).

Nathan

Sean Barbeau

unread,
Jan 17, 2017, 10:34:44 PM1/17/17
to Transit Developers
Nathan,

Just to clarify, I'm inferring from the spec that feeds without a timepoint field that *don't* have times for all stops are non-conforming.

No, this isn't correct.  Feeds are allowed to omit arrival_time and departure_time for an entry in stop_times.txt as long as those entries are not timepoints.  This was true with the original GTFS spec, and remains true with the current spec (as long as the timepoint field is NOT included). 

 it is notable that the introduction of the timepoint field effectively broke backward compatibility and means that many GTFS feeds are now non-conforming (unless a missing timepoint column is interpreted as something other than an empty timepoint value).

The timepoint field didn't change anything in terms of official conformance to the spec for existing feeds - it just gave agencies a way to provide interpolated times in stop_times.txt and still conform to the spec.  Previously many agencies were including interpolated times, even though the spec said not to.  Those datasets were technically never in conformance, and still aren't (unless they add the "timepoint" field).

To clarify, here are some examples.

Here's Dataset A, which is what the original GTFS format requested that agencies provide in stop_times.txt - you should only provide times for arrivals/departures that are timepoints:

trip_id,arrival_time,departure_time,stop_id,stop_sequence
231414,9:20:00,9:20:00,4301,1
231414,       ,       ,3471,2
231414,       ,       ,4456,3
231414,9:23:00,9:23:00,592,4
231414,       ,       ,593,5
231414,       ,       ,4457,6

The first and fourth stops (stop_ids 4301 and 592) are both timepoints.  This feed conforms with the original GTFS format, as no interpolated times are provided.

Here Dataset B, which some agencies started doing - it includes times for all stops, which means some of these are timepoints, and some aren't, and we don't know which are which.  This dataset technically didn't conform to the original GTFS spec (although it became a fairly common practice), and still doesn't conform to today's spec after the timepoint field has been added to the spec:

trip_id,arrival_time,departure_time,stop_id,stop_sequence
231414,9:20:00,9:20:00,4301,1
231414,9:20:42,9:20:42,3471,2
231414,9:22:10,9:22:10,4456,3
231414,9:23:00,9:23:00,592,4
231414,9:23:54,9:23:54,593,5
231414,9:24:06,9:24:06,4457,6

Here is Dataset C, which is basically Dataset B with the new timepoint field added to indicate which stops are timepoints and which are not (i.e., which are interpolated values):

trip_id,arrival_time,departure_time,stop_id,stop_sequence,timepoint
231414,9:20:00,9:20:00,4301,1,1
231414,9:20:42,9:20:42,3471,2,0
231414,9:22:10,9:22:10,4456,3,0
231414,9:23:00,9:23:00,592,4,1
231414,9:23:54,9:23:54,593,5,0
231414,9:24:06,9:24:06,4457,6,0

This dataset conforms with the current GTFS spec, as it provides times for all rows, but also includes the timepoint field to illustrate which are timepoints and which are not.

So, as of today, Dataset A still conforms with the GTFS spec, and Dataset C conforms with the GTFS spec.  Dataset B never conformed with the spec, and still does not.  However, it's relatively simple to turn Dataset B into Dataset C by adding the timepoint field.

Hopefully that makes sense! :)  I agree that more examples such as the above would make this much clearer - I'll see if I can propose adding more of these to the spec to make these concepts clearer.

Sean

As an aside, a bit of history to put this in context - if I recall correctly the primary motivator for agencies including times for all arrivals (including interpolated values) was when multiple consumers (other than just Google) started using GTFS data.  In many of these apps, you could pull up schedule information for any stop in the system.  For stops that aren't timepoints, an interpolated value has to be used.  However, if that interpolated value isn't provided by the transit agency, then each consumer needs to interpolate this value on their own.  This can lead to each consumer showing a different interpolated scheduled arrival/departure time, even if all apps are using the same GTFS data.  Agencies started including interpolated values to normalize this across apps, and the spec just took a very long time to catch up with this practice by adding the timepoint field.

Andrew Byrd

unread,
Jan 18, 2017, 2:33:18 AM1/18/17
to transit-d...@googlegroups.com

> On 18 Jan 2017, at 11:34, Sean Barbeau <sjba...@gmail.com> wrote:
> As an aside, a bit of history to put this in context - if I recall correctly the primary motivator for agencies including times for all arrivals (including interpolated values) was when multiple consumers (other than just Google) started using GTFS data. In many of these apps, you could pull up schedule information for any stop in the system. For stops that aren't timepoints, an interpolated value has to be used. However, if that interpolated value isn't provided by the transit agency, then each consumer needs to interpolate this value on their own. This can lead to each consumer showing a different interpolated scheduled arrival/departure time, even if all apps are using the same GTFS data. Agencies started including interpolated values to normalize this across apps, and the spec just took a very long time to catch up with this practice by adding the timepoint field.

Exactly. Even when agencies are not required (by operational rules) to adhere to non-timepoint departure times, they may publish or otherwise provide those times. GTFS consumers would prefer to use and display the “official” interpolated times. Interpolation by a general-purpose GTFS consuming application may be much worse than the GTFS producer's own interpolation based on operational experience. Good interpolation also requires a lot of information about traffic congestion conditions, the exact roads taken by vehicles, or historical realtime or probe data.

-Andrew

Nathan Johnson

unread,
Jan 18, 2017, 9:43:38 AM1/18/17
to Transit Developers
Sean,

Thank you for the detailed clarification. In concluding that backward compatibility had been broken, I was assuming that a missing timepoint column was equivalent to a timepoint column with empty values:

stop_times.txt Version X
trip_id,arrival_time,departure_time,stop_id,stop_sequence
TRIP1
,10:00:00,10:00:00,STOP1,1
TRIP1
,        ,        ,STOP2,2
TRI
P1,10:10:00,10:10:00,STOP3,3

stop_times.txt Version Y
trip_id,arrival_time,departure_time,stop_id,stop_sequence,timepoint
TRIP1
,10:00:00,10:00:00,STOP1,1,
TRIP1
,        ,        ,STOP2,2,
TRIP1
,10:10:00,10:10:00,STOP3,3,

From my new understanding, Version X is conforming and Version Y is non-conforming - is this right?

Thanks,
Nathan

Sean Barbeau

unread,
Jan 18, 2017, 2:07:58 PM1/18/17
to Transit Developers
Nathan,
Yes, you are correct.  Version X is definitely conforming.  Version Y has empty values for all records for the timepoint field, so it isn't conforming to the spec and validators should throw an error saying the field is defined in the file but isn't populated.  Some consumers may be smart enough to realize that all values for this field are empty and treat it as if it were Version X (and in fact GTFS consumers that haven't been updated to use the timepoint field would do exactly that, but unintentionally out of ignorance that the field exists in the spec), but I definitely wouldn't make that assumption.

Sean

Nathan Johnson

unread,
Jan 18, 2017, 3:48:53 PM1/18/17
to Transit Developers
Sean,

According to the GTFS spec, an empty timepoint value is valid - it just means that the times should be considered exact (the same as timepoint=1). From my interpretation of the spec, the reason Version Y is non-conforming is that an empty timepoint value means the times are exact, so an empty timepoint in conjunction with empty times is an error. It seems reasonable to treat empty timepoint values as equivalent to a missing timepoint column, which would mean that a stop_times.txt with no timepoint column and any empty times would be non-conforming, breaking backward compatibility.

Nathan

Sean Barbeau

unread,
Jan 18, 2017, 4:02:23 PM1/18/17
to Transit Developers
So I think the language needs to be clarified here, then, if it's being interpreted this way.

Historically for optional GTFS fields "empty" has meant that the field (i.e., CSV header and all values) is not provided.  For example, exact_times in frequencies.txt - https://developers.google.com/transit/gtfs/reference/frequencies-file.

So the intent of the timepoint definition is to say that if the "timepoint" value doesn't appear anywhere in the stop_times.txt file, then you interpret the data as you would prior to the introduction of the timepoint field in the spec (i.e., any times provided are timepoints).  This makes the timepoint spec backwards compatible with existing feeds.  If the "timepoint" field is included in the data, then producers should provide 0 or 1 values for all records.  So you shouldn't just add the "timepoint" field to the CSV header and not provide any values.

Sean

--
You received this message because you are subscribed to a topic in the Google Groups "Transit Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/transit-developers/dwd96EwJqIc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to transit-developers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nathan Johnson

unread,
Jan 18, 2017, 4:27:52 PM1/18/17
to Transit Developers
Exactly - if empty means the field is not provided, then it seems that a stop_times.txt without a timepoint field would mean all stop times have the default timepoint value (i.e., are exact), meaning that no arrival/departure times are permitted to be empty, breaking backward compatibility. It would seem necessary to introduce some special logic to prevent this - e.g., if a timepoint column is provided then empty times must be accompanied by a timepoint value of 0; if there is no timepoint column then empty times are treated as timepoint=0 and populated times as timepoint=1.

Paul Harrington

unread,
Jan 24, 2017, 11:11:44 AM1/24/17
to Transit Developers

Interesting thread guys, made me realize that my GTFS implementation was not interpolating arrival or departure times. This hasn't been an issue with the 4 feeds I have so far consumed as they all fully specify arrival and departure times. I presume there are others out there though that don't so I updated the implementation to handle it.

The discussions here and the one linked to above on github about timepoint and arrival/departure time specification are quite abstract at times. Anyway from an implementation perspective I looked at what the intended aim is and basically I store the timepoint value if specified but otherwise don't use it. If a stop time entry for a trip does not have either an arrival or departure time I'll work back through the stops for the trip and use the last one specified. It is crude as the times will be  the same but it is still better than no time at all.  

If I were to use the timepoint going forward to indicate to a commuter that the scheduled time is not timed, then I would only flag this if neither an arrival or departure time was entered in the stop time entry or if at least one of them was entered and a timepoint value of 0 was explicitly specified. 

I currently load data from stop_times into an equivalent database table and run a cleansing method which implements this. The cleansing code is below, I won't know if it is bug free until I come across a feed where interpolation is needed.  If yous do however notice any major flaws with the underlying logic please do let me know.

Thanks Paul.

---

  public void start(int feedId) throws SQLException {
    int stopTimesId, count;
    int countAddZeroArr, countAddZeroDep, countCopyDeparturetoArrival, countCopyArrivalToDeparture, countNoTimepointArr, countNoTimepointDep;
    boolean updateNeeded;
    int numRecords;
    String tripId, arrivalTime, departureTime; 
    String lastSpecifiedArrivalTime, lastSpecifiedDepartureTime, lastTripId;

    logger.log(Level.INFO, String.format("For feed id %d, converting arrival and departure times such as 6:45:00 to 06:45:00 and interpolating empty arrival/departure times", feedId));
    logger.log(Level.INFO, "  Setting autocommit to false");

    db.setAutoCommit(false);
    long startTableLoad = System.currentTimeMillis();

    try (Statement statement = db.createStatement();
        PreparedStatement update = db.prepareStatement(ST_UPDATE)) {

      ResultSet rs = statement.executeQuery(String.format("select * from stop_times where feed_id=%d order by trip_id, feed_id, stop_sequence", feedId));

      lastSpecifiedArrivalTime = lastSpecifiedDepartureTime = lastTripId = "";
      numRecords = countAddZeroArr = countAddZeroDep = countCopyDeparturetoArrival = countCopyArrivalToDeparture = countNoTimepointArr = countNoTimepointDep = 0;

      while (rs.next()) {

        tripId = rs.getString("trip_id");
        stopTimesId = rs.getInt("stop_times_id");
        arrivalTime = rs.getString("arrival_time").trim();
        departureTime = rs.getString("departure_time").trim();

        updateNeeded = false;

        if (!tripId.equals(lastTripId)) {
          lastSpecifiedArrivalTime = lastSpecifiedDepartureTime = "";
        }

        if (arrivalTime.length() == 7) {
          arrivalTime = "0" + arrivalTime;
          countAddZeroArr++;
          updateNeeded = true;
        }

        if (departureTime.length() == 7) {
          departureTime = "0" + departureTime;
          countAddZeroDep++;
          updateNeeded = true;
        }

        if (arrivalTime.length() < 8 && departureTime.length() == 8) {
          arrivalTime = departureTime;
          countCopyDeparturetoArrival++;
          updateNeeded = true;
        }

        if (departureTime.length() < 8 && arrivalTime.length() == 8) {
          departureTime = arrivalTime;
          countCopyArrivalToDeparture++;
          updateNeeded = true;
        }

        if (arrivalTime.length() < 8 && lastSpecifiedArrivalTime.length() == 8) {
          arrivalTime = lastSpecifiedArrivalTime;
          countNoTimepointArr++;
          updateNeeded = true;
        }

        if (departureTime.length() < 8 && lastSpecifiedDepartureTime.length() == 8) {
          departureTime = lastSpecifiedDepartureTime;
          countNoTimepointDep++;
          updateNeeded = true;
        }

        if (updateNeeded) {
          
          update.setString(1, arrivalTime);
          update.setString(2, departureTime);
          update.setInt(3, stopTimesId);

          try {
            count = update.executeUpdate();
          }
          catch (SQLException e) {
            throw new SQLException(String.format("StopTimesId=%s ArrivalTime=%s DepartureTime=%s caused an exception [%s]", 
                stopTimesId, arrivalTime, departureTime,  e.getMessage()));
          }

          if (count != 1) {
            logger.log(Level.WARNING, String.format("  Error updating stop_times with [StopTimesId=%s ArrivalTime=%s DepartureTime=%s]", 
                stopTimesId, arrivalTime, departureTime));
          }
          else {
            numRecords++;
            if (numRecords%100 == 0) {
              // Commit every 100 records
              db.commit();
            }
          }
        }

        lastTripId = tripId;
        lastSpecifiedArrivalTime = arrivalTime;
        lastSpecifiedDepartureTime = departureTime;
      }

      long endTableLoad = System.currentTimeMillis();
      float tableLoadTime = endTableLoad-startTableLoad;
      tableLoadTime = tableLoadTime/1000;
      logger.log(Level.INFO, String.format("  All records for feed checked, %d converted, process time %.2f seconds", 
          numRecords, tableLoadTime));
      logger.log(Level.INFO, String.format("    %d arrivals with \"0\" prepended, %d departures with \"0\" prepended", countAddZeroArr, countAddZeroDep));
      logger.log(Level.INFO, String.format("    %d non timepoint arrivals, %d non timepoint departures", countNoTimepointArr, countNoTimepointDep));
      logger.log(Level.INFO, String.format("    %d departures copied to arrivals, %d arrivals copied to departures", countCopyDeparturetoArrival, countCopyArrivalToDeparture));
    }
    finally {
      db.commit();
      logger.log(Level.INFO, "  Setting autocommit back to true");
      db.setAutoCommit(true);      
    }
  }

Sean Barbeau

unread,
Jan 25, 2017, 9:54:59 AM1/25/17
to Transit Developers

Paul Harrington

unread,
Jan 26, 2017, 5:36:29 AM1/26/17
to Transit Developers
Thanks for that Sean, I was able to verify with it that the timepoint interpolation code was good:

INFO    [2017-01-26 09:52:58]   All records for feed checked, 349164 stop_times converted, process time 20.84 seconds
INFO    [2017-01-26 09:52:58]     0 arrivals with "0" prepended, 0 departures with "0" prepended
INFO    [2017-01-26 09:52:58]     349164 non timepoint arrivals, 349164 non timepoint departures
INFO    [2017-01-26 09:52:58]     0 departures copied to arrivals, 0 arrivals copied to departures

The feed also showed the short comings of my timepoint interpolation algorithm as the number of stops and time gaps between timed stops were quite large. It could be better to take the time between 2 points, divide it by the number of non timed stops and increment the times at each intervening stop by this value. The negative side of this though is that commuters could more easily miss their bus. Have to say I'm happy that most providers seem to give times for each stop and remove this minor headache.

Actually regarding this feed I see there is a developer API but are there plans to add a GTFS real time component ?

As an aside (apologies guys for going off topic) the schedule initially failed to load for me. The feed is different to others I've encountered in that leading/trailling whitespace are not trimmed, e.g.

route_id,service_id,trip_id,direction_id,shape_id
      9248,     87200,   1194302, 1,     24785
      9248,     87200,   1194303, 1,     24785

When trying to insert rows from trips.txt into my trips database table where the direction_id field is defined as:

direction_id char(1) check (direction_id in ('','0','1'))

I was getting a SQL error as it had a value of " 1" and being 2 characters long this did not fit into a one character field. Anyway I've changed my code to trim all fields before DB insertion and loading is now good.

Barbeau, Sean

unread,
Jan 26, 2017, 9:26:51 AM1/26/17
to transit-d...@googlegroups.com

Not sure on GTFS-realtime feed timeline for Lynx, I haven’t heard anything recently about it.

--

You received this message because you are subscribed to a topic in the Google Groups "Transit Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/transit-developers/dwd96EwJqIc/unsubscribe.

To unsubscribe from this group and all its topics, send an email to transit-develop...@googlegroups.com.

Paul Harrington

unread,
Mar 30, 2017, 11:32:30 AM3/30/17
to Transit Developers
Have just implemented timepoint so a value for it is brought through the system and can be used if desired by an end user or application.  
I reread this thread and looked at a few feeds and came to the conclusion that for now the best compromise is:

1)  If the stop_times.txt timepoint feed is explicitly specified and has a value of 1, my end user timepoint value is also "1".
2)  If the stop_times.txt timepoint feed is explicitly specified and has a value of 0, my end user timepoint value is also "0".
3)  In all other cases my end user timepoint value is "", in other words unspecified.

The reason for 3) is that a lot of feeds (as discussed above) have fully interpolated times and no timepoint column. While the spec considers these to be all timepoints, it is clear that in many if not most cases they are not timepoints and to tell a user such would be misleading. A case in point is MBTA who have just added a timepoint column to their stop_times.txt. They specify times for all stops across all transport types in their feed, the current files shows that 87138 out of 2710674 entries are timepoints. Before this change if one was to indicate if stops were timepoints to an end user based on GTFS spec rules, you would be indicating that all 2710674 entries were timepoints.

I recognize (as illustrated above) that there are feeds in which times are only specified for some stops and these stops are most likely timed points with other stops not being timed. My system would report them all as unspecified and that is where the compromise is. 

Aaron Antrim

unread,
May 30, 2017, 4:04:45 PM5/30/17
to Transit Developers
I've brought up a question along these line in the GitHub "GTFS" repo:
https://github.com/google/transit/issues/61

I think we should develop a pull request to make expectations for data and application behaviors more clear and easy to understand.

Sean Barbeau

unread,
May 30, 2017, 4:38:54 PM5/30/17
to Transit Developers
Agreed, I just commented on Github with an updated description of the field that hopefully makes expected behavior easier to understand.

Sean
Reply all
Reply to author
Forward
0 new messages