Hello GTFS developers. I've just come across a GTFS dataset in which one of the stop_id values is the string "NULL".
My code uses the pandas package in python to read the GTFS files and then does some validation. The validation is failing because pandas's read_csv() function interprets the string "NULL" (and "NAN" and some others) as actual null values, and my code's validation methods flag this as a missing stop_id.
My guess is whatever software generated this data in the first place encountered a null stop_id in the database and just exported it as the string "NULL" when it wrote the GTFS CSV files. Something somewhere in the agency's database is probably incorrect or incomplete. However, the use of this stop in the dataset seems consistent and valid. There is only one stop with stop_id "NULL", and there are entries in stop_times.txt where trips visit "NULL" at specific times of day. There's no reason my code wouldn't work with this dataset if it made it through the validation stage.
I can adjust the parameters of the read_csv() function to stop interpreting "NULL" as an actual null value, and this dataset will work fine, but I'm not sure that's the right approach. It seems risky for other datasets where the instance of a "NULL" might indicate some broader problem.
There's nothing in the GTFS spec as far as I'm aware that forbids you from using "NULL" as a stop_id value. So...should I adjust my code to support it?
Any best practices? What does the official GTFS validator do in this case?
Melinda Morang
Product Engineer, Esri