I am working on MTA's subway entry and exit data for 2019 to come up with desirability for each station in NYC. To this end, I filtered 4-hour time windows that end between 11 am-1 pm and calculated net entries and exits for each turnstile data (remote unit + control area + SCP = a unique turnstile). Negative and astronomically large net entries&exits are cleaned from the data which accounts for ~1.5% of the data. Then I filtered summer and winter seasons to compare stations' desirabilities.
However, in both summer and winter, there is a discrepancy in the data: net entries and net exits are not equal/close to each other. The difference between net entries and net exits are around 13% of net entries in summer and 20% in winter data. The discrepancy seems to be evenly distributed between days and stations.
Does anyone have any idea why are net entries greater than net exits?
Deniz Aleyna Akbasaran
Bogazici University, Istanbul, Turkey