GTFS Trip Descriptor Uniqueness in Real Time feed

239 views
Skip to first unread message

under...@gmail.com

unread,
Apr 23, 2019, 8:24:07 PM4/23/19
to mtadeveloperresources
This is in regard to two trains in currently in the real time GTFS feed.

"093100_4..S06R",
"093100_4..S28R",


"The combination of Origin Time, Route ID and Direction can be used to identify a unique trip. The Path Identifier should be considered optional data that will only be provided when known. This could result with it being there at the start of a trip but not during portions of the trip."

So I'm confused.
Are these two different trains in the system with an equivalent descriptor? They don't seem like the same train, since I'm seeing different arrivals for these two descriptors.

Thanks!


Daniel
Underway

sm12...@gmail.com

unread,
Apr 24, 2019, 11:53:28 AM4/24/19
to mtadeveloperresources
Most of the feeds don't include the shape identifier. This means that two different trips can have the same id. This happened last night for the #7 train feed with 2 different trips having the same "103000_7..S" id. One departed from Main St while the other from 111th St. I've found that it's useful to use the concatenation of the trip_id and train_id strings to uniquely identify a trip. The two train id's were: "17 1710 MST/34H" and "17 1710 111/34H". N.B. the trip_id will be changed, if they decide to delay the start. Thus the train_id "17 1710 MST/34H", was assigned to trips: 102701_7..S" and "103000_7..S" (starting at 17:07:01 and 17:10:00 respectively). The train_id "17 1710 111/34H", was assigned to trips: "102950_7..S" and "103000_7..S" (starting at 17:09:30 and 17:10:00 respectively).

This won't solve the problem all the time. There's a guarantee to have duplicate trip-train id's on the first Sunday every November during the 2am-3am change from Daylight to Standard local time. 

under...@gmail.com

unread,
Apr 24, 2019, 4:00:46 PM4/24/19
to mtadeveloperresources
The other day I was riding a 2 train, and the Trip Descriptor's Path Identifier changed while I was still on the train. Since I was doing an exact string match on the Trip Descriptor, the old trip descriptor and the new trip descriptor were no longer equal.

After looking at the documentation, I found that the Shape Identifier shouldn't be used for equality: "The combination of Origin Time, Route ID and Direction can be used to identify a unique trip. The Path Identifier should be considered optional data that will only be provided when known. This could result with it being there at the start of a trip but not during portions of the trip."

So yesterday, I spent some time doing a partial string match on Trip Descriptors, leaving off the Path Identifier. My experience riding the 2 train reflected what the documentation said:  "This could result with it being there at the start of a trip but not during portions of the trip."

But then I ran into colliding Trip Descriptors, for what really looked like two different trains, as noted in my original post.

What key should I use to uniquely identify a train?
The Trip Descriptor minus the Path Identifier plus the train_id?

sm12...@gmail.com

unread,
Apr 24, 2019, 5:53:30 PM4/24/19
to mtadeveloperresources
I'm not MTA, so nothing I say should be construed as official, definitive or even correct.

"The combination of Origin Time, Route ID and Direction can be used to identify a unique trip. The Path Identifier should be considered optional data that will only be provided when known. This could result with it being there at the start of a trip but not during portions of the trip."

I cited two counter examples of why this isn't true. Two different trips starting at different stations with the same route id, origin time and direction are possible. Also, they will change the origin time between when the trip is first sent (30 minutes before scheduled departure) and its actual departure. 

I have found using the train_id to be more reliable in tracking a trip. However, this has fallen apart when there are massive delays.

Aleksey Bilogur

unread,
Apr 30, 2019, 3:19:04 PM4/30/19
to mtadeveloperresources
There is no true unique ID in GTFS-RT. If you want one, you will have impute it yourself. I wrote a software library (https://github.com/ResidentMario/gtfs-tripify/tree/master/gtfs_tripify) which does this as a subroutine, which may be a good reference for your own implementation should you choose to pursue it: https://github.com/ResidentMario/gtfs-tripify/blob/master/gtfs_tripify/tripify.py#L360.

Underway

unread,
Mar 4, 2023, 2:23:09 PM3/4/23
to mtadeveloperresources
Thanks! Reading through your code, and wondering if your code preserves trip IDs across different real-time updates? For example, if I load a real-time feed at 1pm, and then load the next batch of real-time updates 5 minutes later at 1:05pm, will the trip IDs generated by your code be stable?
Reply all
Reply to author
Forward
0 new messages