EntitySelector documentation clarification

207 views
Skip to first unread message

Dave Barker, MBTA

unread,
Oct 5, 2016, 9:55:00 AM10/5/16
to GTFS-realtime
One of our third-party developers raised an issue that highlighted ambiguity in GTFS-realtime guide. It came up because because the MBTA includes route_type alongside every route; if that's considered a bad practice please advise and we can consider changing it. Here's the documentation issue. 

The .proto indicates that if an EntitySelector has multiple criteria, that should be treated as an "and" condition not an "or" condition. 

// A selector for an entity in a GTFS feed.message EntitySelector {
 
// The values of the fields should correspond to the appropriate fields in the
 
// GTFS feed.
 
// At least one specifier must be given. If several are given, then the
 
// matching has to apply to all the given specifiers.
 optional 
string agency_id = 1;
 optional 
string route_id = 2;
 
// corresponds to route_type in GTFS.
 optional int32 route_type 
= 3;
 optional 
TripDescriptor trip = 4;
 optional 
string stop_id = 5;

But there is nothing in the Alerts entity selector description in the guide to indicate that, and the use of phrases like "whole route" and "any route of this type" language can be read to imply the opposite: 

Entity selector allows you specify exactly which parts of the network this alert affects, so that we can display only the most appropriate alerts to the user. You may include multiple entity selectors for alerts which affect multiple entities.


Entities are selected using their GTFS identifiers, and you can select any of the following:

    • Agency: Affects the whole network
    • Route: Affects the whole route
    • Route type: Affects any route of this type. e.g. all subways.
    • Trip: Affects a particular trip
    • Stop: Affects a particular stop
Assuming the .proto is correct, a change to the guide documentation like the following could avoid confusion: 

Entity selector allows you specify exactly which parts of the network this alert affects, so that we can display only the most appropriate alerts to the user. An entity selector containing multiple criteria applies only to the intersection of that criteria, i.e. if it specifies a route and a stop then the alert impacts service of that route at that stop. You may include multiple entity selectors for alerts which affect multiple entities.


Entities are selected using their GTFS identifiers, and you can select any of the following:

    • Agency: Affects the whole network
    • Route: Affects the route
    • Route type: Affects routes of this type. e.g. subways.
    • Trip: Affects a particular trip
    • Stop: Affects a particular stop

-Dave Barker, MBTA

Stefan de Konink

unread,
Oct 5, 2016, 10:14:48 AM10/5/16
to GTFS-realtime
Is it an idea to rephrase it in a proper sentence including something on
the lines of:

It is best practise to only use the most specific argument with the least
criteria.

Stefan
> * Agency: Affects the whole network
> * Route: Affects the whole route
> * Route type: Affects any route of this type. e.g. all subways.
> * Trip: Affects a particular trip
> * Stop: Affects a particular stop
> Assuming the .proto is correct, a change to the guide documentation like the
> following could avoid confusion: 
>
> Entity selector allows you specify exactly which parts of the
> network this alert affects, so that we can display only the most
> appropriate alerts to the user. An entity selector containing
> multiple criteria applies only to the intersection of that
> criteria, i.e. if it specifies a route and a stop then the alert
> impacts service of that route at that stop. You may include
> multiple entity selectors for alerts which affect multiple
> entities.
>
>
> Entities are selected using their GTFS identifiers, and you can select
> any of the following:
>
> * Agency: Affects the whole network
> * Route: Affects the route
> * Route type: Affects routes of this type. e.g. subways.
> * Trip: Affects a particular trip
> * Stop: Affects a particular stop
>
> -Dave Barker, MBTA
>
> --
> You received this message because you are subscribed to the Google Groups
> "GTFS-realtime" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gtfs-realtim...@googlegroups.com.
> To view this discussion on the web visithttps://groups.google.com/d/msgid/gtfs-realtime/effa06d7-f2ad-4655-8627-8c4
> 977e98019%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> !DSPAM:1,57f5064b107315173618894!
>

Sean Barbeau

unread,
Oct 6, 2016, 11:47:59 AM10/6/16
to GTFS-realtime
You're right, this is definitely a gray area in the spec.

The closest the documentation comes to clarifying this is in "Best practice for managing alerts", under the Alert reference:
https://developers.google.com/transit/gtfs-realtime/reference/Alert

Be careful not to specify too many parameters as you might cancel out the alert: it will be accepted, but never shown. For example, if you define an alert for a stop AND a route, but the stop is not part of that specific route, the message may never be shown. Generally, be more generic than restrictive.

My expectation is that if a route_type is provided, then the alert applies to all routes of that type (regardless of what route_id is included).  Otherwise, the Alert route_type field is redundant information.  So, my proposal would be that within Alerts, if route_type is provided, then route_id is ignored for that alert.

I think this needs to be clarified as a hard requirement in the spec (vs. a best practice), otherwise there isn't a clear understanding between consumers and producers of where alerts should be displayed.

Sean

Stefan de Konink

unread,
Oct 6, 2016, 11:56:25 AM10/6/16
to gtfs-r...@googlegroups.com
In general I would sometimes see the case of AND functionality. But given the current cardinality would it apply to GTFS?

Stefan

From: Sean Barbeau
Sent: ‎6-‎10-‎2016 17:48
To: GTFS-realtime
Subject: Re: [GTFS-realtime] EntitySelector documentation clarification

> wbr>57f5064b107315173618894!
>

--
You received this message because you are subscribed to the Google Groups "GTFS-realtime" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-realtim...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
!DSPAM:1,57f6723d16331256519421!

Sean Barbeau

unread,
Oct 6, 2016, 12:08:46 PM10/6/16
to GTFS-realtime

In general I would sometimes see the case of AND functionality

From https://developers.google.com/transit/gtfs-realtime/reference/EntitySelector:

If the data must be linked to multiple entities of the same type (such as two stops), create two different EntitySelector fields (each with a unique stop_id) for use as repeated values in the informed_entity field (in an alert, for example). See more on this at Alert.

So if you want an alert to apply to all route_type=4 (Ferry), and to route_id=5 (which is not a ferry), then you'd include multiple informed_entities:
 
alert {
    informed_entity {
      agency_id: "1"
      route_id: "5"
    }
    informed_entity {
      agency_id: "1"
route_type: 4
 }

But given the current cardinality would it apply to GTFS?

EntitySelector has "Repeated" cardinality, so multiple informed_entities are allowed, such as the above.  Is that what you're asking?

So to clarify my proposal above - if route_type and route_id are included in the same informed_entity within an Alert, then route_id is ignored, and the alert applies to all route_types of that type.

Sean

Paul Harrington

unread,
Oct 6, 2016, 12:16:27 PM10/6/16
to GTFS-realtime
Even if not the most intuitive it was quite clear to me anyway from the outset. At


the first paragraph is

"A selector for a transit entity in a GTFS feed (agency, stop, route, and so on). Identify GTFS entities by using theEntitySelector field. Set the field values to correspond to the appropriate identifier fields in the GTFS feed. You must provide at least one GTFS entity identifier in the EntitySelector field. If you select several such fields, then the matching MUST apply to all the identifiers."

Stefan de Konink

unread,
Oct 6, 2016, 3:40:38 PM10/6/16
to gtfs-r...@googlegroups.com
On Thursday, October 6, 2016 6:08:46 PM CEST, Sean Barbeau wrote:
> But given the current cardinality would it apply to GTFS?
>
> EntitySelector has "Repeated" cardinality, so multiple
> informed_entities are allowed, such as the above. Is that what
> you're asking?

The question is more in the direction: is it ever possible to have a
combination of fields that may lead to a *more* specific instance, instead
of any instance which has the same properties anyway.

In my interpretation the only thing that would be allowing AND is:

- stop_id + something else [it applies to trips/routes at this
(parent)stop]
- agency_id + route_route [it applies to all modes of this agency]

agency_id + something else can't lead to more details because trip is
dataset unique and so is route_id, thus they can only have one agency_id in
the first place. Please correct me when I am wrong(!)

So I would prefer that valid combinations are writen out with the reason
why they are valid combinations, and why others are implied when a field is
used. I am not against a producer putting this extra information in, but
from consumer perspective I rather have the documentation say which pairs
must be supported and what field order has priority in interpretation.

Hope this clears it all up.

--
Stefan

Barbeau, Sean

unread,
Oct 6, 2016, 3:54:36 PM10/6/16
to gtfs-r...@googlegroups.com
> So I would prefer that valid combinations are writen out with the reason why they are valid combinations, and why others are implied when a field is used.

Agreed, that would definitely help clarify the various use cases.

On a related note, I'm starting work on a project that's going to be clarifying some of these gray areas in the GTFS-rt spec (including semantic cardinality - https://github.com/google/transit/pull/19), as well as fleshing out an open-source validator, so I'd be happy to put an official proposal together for this, including the various use cases for Alerts and informed_entities.

Dave,
Does my proposal above make sense in the context of MBTA?

If I understand your original post correctly that would imply removing route_type from all your alerts, unless they were intended to apply to all routes of that route_type.

Sean

-----Original Message-----
From: gtfs-r...@googlegroups.com [mailto:gtfs-r...@googlegroups.com] On Behalf Of Stefan de Konink
Sent: Thursday, October 06, 2016 3:46 PM
To: gtfs-r...@googlegroups.com
Subject: Re: [GTFS-realtime] EntitySelector documentation clarification

--
You received this message because you are subscribed to a topic in the Google Groups "GTFS-realtime" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gtfs-realtime/jamsDygrcSk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-realtime/1882e047-45fe-4943-8069-fa1f28968ce1%40konink.de.

Stefan de Konink

unread,
Oct 6, 2016, 5:21:40 PM10/6/16
to gtfs-r...@googlegroups.com
On donderdag 6 oktober 2016 21:54:29 CEST, Barbeau, Sean wrote:
> On a related note, I'm starting work on a project that's going
> to be clarifying some of these gray areas in the GTFS-rt spec
> (including semantic cardinality -
> https://github.com/google/transit/pull/19), as well as fleshing
> out an open-source validator, so I'd be happy to put an official
> proposal together for this, including the various use cases for
> Alerts and informed_entities.

Sounds great. The PlannerStack QA tests may be a good starting point. I do
hope that we could come up with a document and gtfs-rt snippets consumers
can use to validate the working of their product as well.

https://github.com/plannerstack/testset

[I know there was a very nice LaTEx document somewhere, can't find it now]

I would certainly would like to be involved to review the work.

Stefan

Dave Barker, MBTA

unread,
Oct 6, 2016, 6:39:17 PM10/6/16
to GTFS-realtime
Sean, 

You're right that under your proposal we'd have to remove route_type for all our alerts except for those that only apply to one route_type. Extrapolating I assume we'd also have to remove route_id for alerts that apply to only one trip, since route is to route_type as trip is to route. The proposal could be described as "an entity selector may contain up to 1 of the following: route_type, route, and trip."

There are advantages to the way we currently do it. We include route_type, or route_type and route, or route_type and route and trip. This isn't any extra effort to include and it can save steps for the feed consumer. If a feed consumer wants all commuter rail alerts, for example, they can reliably key off of the route_type value and not have to look up the trip's route's route_type of every route it comes across. Some tasks become possible without even maintaining a copy of GTFS that would not be possible otherwise. 

-Dave

Stefan de Konink

unread,
Oct 6, 2016, 7:02:16 PM10/6/16
to gtfs-r...@googlegroups.com
On vrijdag 7 oktober 2016 00:39:17 CEST, Dave Barker, MBTA wrote:
> There are advantages to the way we currently do it. We include
> route_type, or route_type and route, or route_type and route and
> trip. This isn't any extra effort to include and it can save
> steps for the feed consumer. If a feed consumer wants all
> commuter rail alerts, for example, they can reliably key off of
> the route_type value and not have to look up the trip's route's
> route_type of every route it comes across. Some tasks become
> possible without even maintaining a copy of GTFS that would not
> be possible otherwise.

Agreed. But we are again entering the domain that Jorden and Andrew
described before as feeds of different levels of integration. The problem
doesn't occur when the producer knows what he is doing and might help the
consumer which doesn't need futher enrichment. But when ambigious, and not
matching the scheduled data, the messages might become unprocessible which
is the state we want to prevent.

In my perspective documentation is key. If you are filling this fields, and
indeed with a decent system trivial to do, make sure the contents are
matching to your GTFS values. If you are consuming process the attributes
in order of most specific to least specific. I think it will be something
like: stop_id, agency_id, trip_id, route_id, route_type. I hope Sean can
come up with some nice state-machine drawing.

--
Stefan

Dave Barker, MBTA

unread,
Oct 18, 2016, 2:20:58 PM10/18/16
to GTFS-realtime
Not sure if we've reached a consensus? If there's a change in documentation that renders the way the MBTA constructs affected_entities contrary to best practices (or incorrect) we will change our output. Otherwise we will leave it as-is for the time being and continue to include all descriptors, even redundant ones. 

-Dave

Stefan de Konink

unread,
Oct 18, 2016, 2:32:32 PM10/18/16
to gtfs-r...@googlegroups.com
On Tuesday, October 18, 2016 8:20:57 PM CEST, Dave Barker, MBTA wrote:
> Not sure if we've reached a consensus? If there's a change in
> documentation that renders the way the MBTA constructs
> affected_entities contrary to best practices (or incorrect) we
> will change our output. Otherwise we will leave it as-is for the
> time being and continue to include all descriptors, even
> redundant ones.

Your correct output doesn't affect consumers. Still we must assign a
processing sequence, so producers may also choose to skip values.

--
Stefan

webm...@rideschedules.com

unread,
Oct 28, 2016, 6:19:33 PM10/28/16
to GTFS-realtime
I am opposed to changing the documentation. 

Remember, GTFS-R alerts is new and people, as well as software, are not fully acclimated to it yet.  Many systems are still maturing and being optimized for specific use cases.

Remember, we want the Guide to be clear for ordinary people and not just software engineers.   

If you want an alert to be shown for a stop, include the stop_id. 
For a trip, include the trip_id. 
For a route include, the route_id.
For an entire modal system, include the route_type.  
For an entire agency, include the agency_id (or nothing since agency_id is optional).

It is then up to the application to design a user interface and display alert information expected by the user.

Publishers in turn use their judgement based on the substance of the alert and passenger interests to decide which entities to inform.

I do agree more detailed Guidelines would be helpful, but removing 'All' or adding 'Intersection' to the Guide, as suggested, would not promote clarity.

Adding examples of common alert scenarios would be helpful for both publishers and consumers.

For example,

How to report a stop out of service.
How to report a detour.
How to report alerts affecting a single route or group of routes.
How to report alerts affecting the subway or other modal system.
How to report system-wide alerts.

I think the best selection of informed entities are most restrictive without creating ambiguity or irrelevance, but ultimately requires good judgment based on the substance of the alert. 

The intuitive rules I surmise from the documentation are as follows:

For an agency-wide alert, inform only the agency_id or nothing. 
For a modal system-wide alert, inform only the route_type,
For an alert affecting one or a handful of routes, inform only the route_ids.
For an alert affecting a specific route trip, inform only the trip_id.
For an alert affecting one or a handful of stops, inform the stop_ids and route_ids that service them. 

The only non-intuitive rule is the route_ids with a stop alert.  If route_ids aren't included, the alert would have to become system-wide or hidden for users interested in all published alerts. 

Other intuitive options, such as including stop_id with a route_id for a detour, are mostly stylistic since the additional entities wouldn't be irrelevant.  

Remember, in this context strict = nice and optional = rude because everybody wants the right way of doing things to be obvious, which is nice. 

Ritesh Warade

unread,
Oct 31, 2016, 10:31:04 AM10/31/16
to gtfs-r...@googlegroups.com

+1 for

 

Adding examples of common alert scenarios would be helpful for both publishers and consumers.

 

For example,

 

How to report a stop out of service.

How to report a detour.

How to report alerts affecting a single route or group of routes.

How to report alerts affecting the subway or other modal system.

How to report system-wide alerts.

 

---

Ritesh Warade

 

From: gtfs-r...@googlegroups.com [mailto:gtfs-r...@googlegroups.com] On Behalf Of webm...@rideschedules.com
Sent: Friday, October 28, 2016 6:20 PM
To: GTFS-realtime <gtfs-r...@googlegroups.com>
Subject: Re: [GTFS-realtime] EntitySelector documentation clarification

 

I am opposed to changing the documentation. 

--
You received this message because you are subscribed to the Google Groups "GTFS-realtime" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-realtime/30394574-30f4-488c-a26c-f127d1cb8bc4%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages