proposal: time format

21 views
Skip to first unread message

Bob Heitzman

unread,
Jun 14, 2007, 3:36:38 PM6/14/07
to Google Transit Feed Spec Changes
Suggest supporting the substituion of underscores for colons in fields
representing "time" in the GTFS text files.

When working with MS Office tools the current "time" format is
automagically recognized as a time value and data types and formating
are assigned. However, the GTFS format for "time" is not truly a time
value since the HH position can exceed 23. A simple way to prevent the
automagic conversion is to use a format that MS does not recognize as
a time value. Hence, it is proposed that HH_MM_SS be a valid time
value for GTFS text files.

Joe Hughes

unread,
Jun 18, 2007, 3:54:40 AM6/18/07
to Google Transit Feed Spec Changes
Hi Bob,

Thanks for taking a look at these sorts of Office-related issues.
Could you explain in more detail what problems the current time format
causes when creating feeds using Office? I tried entering times like
"27:00" in Office 2003 on Windows XP, and it seemed to preserve those
values correctly when writing the spreadsheet to CSV.

Joe Hughes
Google Transit

Bob Heitzman

unread,
Jun 19, 2007, 11:46:46 AM6/19/07
to Google Transit Feed Spec Changes
This suggestion is mostly based on experience - working with strings
that are valid time sequences in MS Office often results in obscure
problems difficult to notice especially in large data sets.

For example if you open/import such a field it will be treated as a
datatime format. It is often not obvious that such a conversion has
taken place. For example if you open the stop_times.txt file in the
sample the time strings look OK except that any leading zeros, if any,
are removed. If you examine the format of the cells a time format has
most likely been assigned. If you try to change any imported hour
values to 24+ Excel will display the time past midnight and not the
string you entered.

In your case most likely the cell format was set to Text and Excel had
already be told to ignore the fact that the strings were to be treated
as time. You may even get Excel to display 24+ hours on the screen but
if you save the file I would bet you will not see the 24+ hour field.

Access has similar issues.

I am working on an Export tool that will deal with the time format as
currently expected by the feed format. My import routine also deals
with the issue by replacing colons with underscores on import.

To directly answer you question there aren't any specific problems
with the current time format (other than it isn't really a time
format). But experince tells me that strings the look like time/date
are trouble in MS Office if the strings are expected to retain a
specific format. (regional settings is another issue) So if you want
to use MS Office tools to manage feed data things will be much easier
if the strings can never be mistaken for time/date values. If MS
Office users want to use the data as time/date values they can
intentionally convert the data to datetime format.

It would be nice if the server side import could tolerate underscores
as well as colons as time delimiters. A simple replace of underscores
with colons just before you process the time string should be a one
line change.

BTW there is a similar problems with numbers that are expected to have
leading zeros - this is very difficult to manage in MS Office.

T Sobota

unread,
Jun 22, 2007, 9:04:26 AM6/22/07
to Google Transit Feed Spec Changes
Another alternative for time would be the seconds past midnight format
(this happens to be the DB storage format for one particular transit
scheduling software package).

This format does work particularly well for trips past midnight (86400
seconds and above) - and can also be calculated in the MS Office
environment from their decimal-based time formatting by multiplying
the fractional MS time value by 86400 [07:00 AM = .291667 * 86400 =
25200 second past midnight].

I do realize this magnitude of change could well require that all
existing transit feeds be reformatted, although I would wonder if the
logic of the feed processing on Google's end might allow both the
current and the SPM format - somehow keying in on the field value
where if a colon (or underscore) were present to interpret the data in
an HH MM SS manner, but when lacking such time punctuation calculating
the time based on the SPM basis?

-Tim Sobota

Joe Hughes

unread,
Jun 22, 2007, 4:39:26 PM6/22/07
to gtfs-c...@googlegroups.com
Hi Tim,

Thanks for the suggestion. Seconds past midnight has some advantages
(less calculation required) and some disadvantages (harder for humans
to spot-check or enter by hand) compared the HH:MM:SS past midnight
that the GTFS currently uses.

If we were to change the time format now, though, we would either need
to break compatibility with existing feeds (causing extra work for
agencies and writers of feed-publishing software), or complicate the
spec by adding two possible time formats. The problem with the latter
is that it increases the amount of work that an application creator
needs to do to add basic feed import support to their product, which
then leads to less products supporting GTFS, or products that offer
poor support because they only implement one of the two possible time
formats. This in turn decreases the usefulness of GTFS, and increases
the likelihood that the agency will need to do extra work to use their
data with transit applications other than Google Transit.

So, as I mentioned in the welcome message, we should try to minimize
the amount of churn in the format, and avoid making incompatible
changes until most of us are convinced they have strong, clear
advantages.

Anyway, welcome to the group, and thanks for taking the time to get
involved in these discussions!

Joe Hughes
Google Transit

Reply all
Reply to author
Forward
0 new messages