1. Missing Data
The new subway data omits vital information that was present on the old format.
Here's the csv heading from last week's data:
There are two export buttons on the NYS Open Data website.
The first downloads all the data from Feb 2022. Here's the csv header:
The second export button provides this csv header:
You will notice the Exit count that was present on the previous data is no longer present. This is a serious omission that prevents determining travel pattern changes.
2. Time Stamp Glitch
Here are time stamp samples from the new format:
03/20/2022 11:00:00 PM
A problem arose with the release of hourly data. Time zone information is important because there are two clock changes between standard and daylight time. This means there will be two entries for 02:00:00.000 or 02:00:00 AM on the 05 Nov 2023 when there is a time change.
Neither timestamp version shows the difference between local time and UTC. That's standard practice, when timestamps are displayed in local time.
3. Extra Information - Station location.
Is it necessary to include two forms for each station's geographic coordinates on an hourly basis? People who need to use geographic data should be proficient in converting between x, y coordinates and a Point.
Also, does this information need to be presented on an hourly basis? Do subway station locations move that quickly?
4. Download Size.
The csv of the weekly turnstile data required around 33 MB. The two new csv's required 521.1 MB and 548.3 MB. The major reason for this increased data size is the new data isn't available in weekly chunks. There's no way to filter the data by time before downloading.