Handling stop_id "NULL"

64 views
Skip to first unread message

Melinda Morang

unread,
May 17, 2023, 4:08:27 PM5/17/23
to Transit Developers
Hello GTFS developers.  I've just come across a GTFS dataset in which one of the stop_id values is the string "NULL".

My code uses the pandas package in python to read the GTFS files and then does some validation.  The validation is failing because pandas's read_csv() function interprets the string "NULL" (and "NAN" and some others) as actual null values, and my code's validation methods flag this as a missing stop_id.

My guess is whatever software generated this data in the first place encountered a null stop_id in the database and just exported it as the string "NULL" when it wrote the GTFS CSV files.  Something somewhere in the agency's database is probably incorrect or incomplete.  However, the use of this stop in the dataset seems consistent and valid.  There is only one stop with stop_id "NULL", and there are entries in stop_times.txt where trips visit "NULL" at specific times of day.  There's no reason my code wouldn't work with this dataset if it made it through the validation stage.

I can adjust the parameters of the read_csv() function to stop interpreting "NULL" as an actual null value, and this dataset will work fine, but I'm not sure that's the right approach.  It seems risky for other datasets where the instance of a "NULL" might indicate some broader problem.

There's nothing in the GTFS spec as far as I'm aware that forbids you from using "NULL" as a stop_id value.  So...should I adjust my code to support it?

Any best practices?  What does the official GTFS validator do in this case?

Melinda Morang
Product Engineer, Esri

Richard Wolf

unread,
May 17, 2023, 6:02:21 PM5/17/23
to 'Emma Blue' via Transit Developers
Huh ... interesting question. I am not an expert (would love to see what the experts think), but my overall impression of the GTFS static feed specification is that everything is, basically, a string ... and that while some strings do need to have certain formats (e.g., Currency Code, Latitude, Longitude, Color), others can be ... anything that is a string. In other words, it doesn't seem that the concept of "nullity" is in GTFS. If "null" is supplied as a value for a GTFS field type (that has no other formatting restriction), then it would be as-good-as any other four-character string ... I think any software reading the feed would best interpret "null" as simply the four characters "n", "u", "l", "l" and remain agnostic as to any interpretations of those characters.
Reply all
Reply to author
Forward
0 new messages