csv syntax

Skip to first unread message

Tom Brown

Nov 12, 2008, 5:39:24 PM11/12/08
to Google Transit Feed Spec Changes
I've been making some changes to the feed validator to warn about
unusual syntax that is not interpreted in the same way be all parsers.
Each of these changes was motivated by one or more problematic GTFS
files that I've seen.

adds a warning if a file contains a header such as
stop_id, stop_name, stop_lat , stop_lon
The spaces before the names are okay but the space after stop_lat is
considered by many parsers, including Excel, to be part of the value.

checks that each line ends in CRLF or LF
Files created in very old Apple computers will need to be converted
and also warns about some hard-to-find corruption issues

Before making these changes I checked all the feeds we have at Google
to see if it will cause a widespread problem and they didn't seem to.
Most or all of these problematic files were created when people
modified files by hand, not systematic different interpretation of the
format in tools.

My proposal:
Directly refer to http://tools.ietf.org/html/rfc4180 as the expected
format for GTFS csv files. There are a couple deviations from RFC4180
common in GTFS files:
1) there is often one or more space characters after the , that is
between fields. Tools that parse GTFS should skip these spaces.
2) some GTFS files start with a utf-8 byte order marker which parsers
should skip

Tom Brown

Nov 19, 2008, 7:34:44 PM11/19/08
to Google Transit Feed Spec Changes

If you have any comments regarding this proposal please speak up. If everyone remains silent I'll forward this to our tech writer next week.
Reply all
Reply to author
0 new messages