February 2025 bus gtfs duplicate trips

11 views
Skip to first unread message

elif ensari

unread,
10:19 AM (10 hours ago) 10:19 AM
to mtadeveloperresources
Hi,
We are working with NYC-wide static bus gtfs data for weekday trips from February 2025, and finding several duplicate trips with distinct trip ids. 

Some of these are identical trips with the exactly matching routes, stop sequences and stop times (1). Some are trips with identical routes, stop sequences, first departure and last arrival times but with slight differences in one or more stop times (2). Some are trips with identical routes, stop sequences, first departure times but with a slight shift in stop times after a portion of the trip (3).

Below are example trip routes and trip ids for each case:
1) B74:
UP_A5-Weekday-SDon-086500_B74_605, UP_A5-Weekday-SDon-086500_B6_204, UP_A5-Weekday-SDon-086500_B6_281, UP_A5-Weekday-SDon-086500_B6_206

2) B8:
JG_A5-Weekday-147400_B8_150, JG_A5-Weekday-SDon-147400_B8_150

3) Q54:
FP_A5-Weekday-121700_Q54_725','FP_A5-Weekday-SDon-121700_Q58_880
My question is, what would be a reliable way of eliminating trip ids that don't represent actual trips scheduled to take place simultaneously on a weekday?
Can we use the "School Day On" tag (SDon) in the trip name to only work with those trips? That wouldn't resolve the issue in the first example.

The reason we are using an old version is that we are running ridership prediction models using multiple data sources and we need information on bus service from before Queens bus network redesign, as it aligns with our route level ridership numbers.
Thank you


Jayden Lin

unread,
7:06 PM (2 hours ago) 7:06 PM
to mtadeveloperresources
I can elaborate on each of those examples

1) Those are each a separate bus trip, as they are school trips that serve students. I'd assume for the B74 it's to bring students from Mark Twain to the Coney Island Subway Station on school days only.

2) Those are similar trips, but they aren't the same internally in MTA. Internally, school open and school closed schedules are completely different, so they may look the same to the average commuter or get someone confused looking at like like why does this exist, but internally, it's a different story. Same can be explained for point 3 you maid.

To answer your question, the most reliable way is to make the schedule be based off of day, not whether or not it's a weekday or not. Remember, not all routes are created equal (example: B6 has a different schedule on weekdays, depending on whether school is open or not, as well as the B9). GTFS normally provides a "calendar_dates.txt" file to exclude school open schedules from being displayed on days that school is closed and vice versa. It looks something like what I will show below (this is a part of the current GTFS for Brooklyn)

service_id,date,exception_type
EN_B6-Sunday,20260525,1
EN_B6-Weekday-SDon,20260525,2
EN_B6-Weekday-SDon,20260527,2
EN_B6-Weekday-SDon,20260604,2
EN_B6-Weekday-SDon,20260619,2
EN_B6-Weekday,20260528,2
EN_B6-Weekday,20260529,2
EN_B6-Weekday,20260601,2
EN_B6-Weekday,20260602,2
EN_B6-Weekday,20260603,2
EN_B6-Weekday,20260605,2
EN_B6-Weekday,20260608,2
EN_B6-Weekday,20260609,2
EN_B6-Weekday,20260610,2
EN_B6-Weekday,20260611,2
EN_B6-Weekday,20260612,2
EN_B6-Weekday,20260615,2
EN_B6-Weekday,20260616,2
EN_B6-Weekday,20260617,2
EN_B6-Weekday,20260618,2
Reply all
Reply to author
Forward
0 new messages