Gathering Rules for a GTFS-rt Validator

161 views
Skip to first unread message

Nipuna Gunathillake

unread,
Mar 22, 2015, 5:10:57 PM3/22/15
to gtfs-r...@googlegroups.com
Hello everyone, 

I'm a student hoping to take part in the GSoC project to make a GTFS-realtime validator for Open Trip Planner as suggested in the OTP wiki GSoC ideas page.

This idea was discussed further in the OTP Developer mailing list : 

The general idea for behind the proposed project is to create a stand-alone application that validates GTFS-rt feeds. 
The application should take GTFS and GTFS-rt feeds, evaluate them over a period of time giving errors and warnings. 

Sean Barbeau suggested that a discussion should be started here to identify the rules and best practices that should be checked by a potential validator. 
I would like to kindly ask your help in formulating the rules that should be checked by a GTFS-rt validator.


Some sample rules that should be checked have been mentioned on the wiki page

I've read through the references at the google trainsit pages.  
From there, some of the requirements are clear, such as the Time-stamps using the POSIX time format.

But, some of the rules that should be validated such as:
"If both vehicle positions and trip updates are provided, VehicleDescriptor or TripDescriptor values should match between the two feeds (warning)."
(Sample rule 6 in the wiki needs a deeper and more practical understanding of the GTFS-rt format in order to identify. 

I'd like to ask for your help in formulating these rules. 

Also, I was thinking about setting up a small wiki page stating the rules and the best practices. Like it has been done for the GTFS validator (link). 
I can maintain a wiki page containing the rules that need to be validated as discussed here.

Hopefully it will help anyone building a validator tool and those who are implementing a GTFS-rt feeds.

Any other ideas on how to gather the requirements and/or how to document them would also be much appreciated. 
 
Thank you. 

Best Regards,
Nipuna Gunathilake

Sean Barbeau

unread,
Mar 23, 2015, 10:41:50 AM3/23/15
to gtfs-r...@googlegroups.com
Nipuna,
Thanks for posting here.  I'd also encourage you to follow this thread in case there is further comment from Google on their interpretation of semantic cardinality of fields (i.e., what data elements should be populated by GTFS-rt producers):
https://groups.google.com/d/msg/gtfs-realtime/wm3W7QIEZ9Y/kBs5zq_VYO4J

Also to clarify scope, this GTFS-rt validator would be used in the context of any GTFS-rt feed, not just ones to be used with OTP.  However, I do think its good to use OTP as a sample consumer when working on the validator, since it gives you some context of why certain fields should be mandatory, etc.



Also, I was thinking about setting up a small wiki page stating the rules and the best practices. Like it has been done for the GTFS validator (link). 
I can maintain a wiki page containing the rules that need to be validated as discussed here.

Agreed, I think this would be good.

Sean

Nipuna Gunathillake

unread,
Mar 24, 2015, 12:07:16 AM3/24/15
to gtfs-r...@googlegroups.com

Thanks for posting here.  I'd also encourage you to follow this thread in case there is further comment from Google on their interpretation of semantic cardinality of fields (i.e., what data elements should be populated by GTFS-rt producers):
https://groups.google.com/d/msg/gtfs-realtime/wm3W7QIEZ9Y/kBs5zq_VYO4J

I've started to follow that conversation. If there is any change I'll take that into consideration.  
 
Also to clarify scope, this GTFS-rt validator would be used in the context of any GTFS-rt feed, not just ones to be used with OTP.  However, I do think its good to use OTP as a sample consumer when working on the validator, since it gives you some context of why certain fields should be mandatory, etc.
 
Yes of course, a standalone application should be usable for anyone who wants to validate a GTFS-rt feed. 
I will take a better look at using OTP as a consumer. 

Also, We would need to generate feeds to test the validator itself. Would the One Bus Away feed generators work for this purpose? 


Also, I was thinking about setting up a small wiki page stating the rules and the best practices. Like it has been done for the GTFS validator (link). 
I can maintain a wiki page containing the rules that need to be validated as discussed here.

Agreed, I think this would be good.

I will start working on it today and post a sample wiki here. 
Hopefully it would be easier to get feedback that way.  

Best Regards,
Nipuna. 

Sean Barbeau

unread,
Mar 24, 2015, 9:22:52 AM3/24/15
to gtfs-r...@googlegroups.com
Also, We would need to generate feeds to test the validator itself. Would the One Bus Away feed generators work for this purpose?

I'd set up unit tests within the validator itself with known pass and failure conditions, so we know each rule is working correctly.  Easiest way to do this for polling implementation would be to simply bundle the encoded protobuf files for tests with the app - you could simply copy the output of the OneBusAway or other feed generators.  For streaming websockets example we could set up included tests as well, see - https://github.com/OneBusAway/onebusaway-gtfs-realtime-exporter/tree/master/src/test/java/org/onebusaway/gtfs_realtime/exporter.

So, the validator project should be entirely self-contained, including tests.

Sean
 

Nipuna Gunathillake

unread,
Mar 24, 2015, 5:38:36 PM3/24/15
to gtfs-r...@googlegroups.com

I'd set up unit tests within the validator itself with known pass and failure conditions, so we know each rule is working correctly.  Easiest way to do this for polling implementation would be to simply bundle the encoded protobuf files for tests with the app - you could simply copy the output of the OneBusAway or other feed generators.  For streaming websockets example we could set up included tests as well, see - https://github.com/OneBusAway/onebusaway-gtfs-realtime-exporter/tree/master/src/test/java/org/onebusaway/gtfs_realtime/exporter.

Okay, That sounds like a much better solution. The validator wouldn't have to rely on any external tools then?

Then, for each rule there must be, 
  • A pluggable method testing for an given known error 
  • Unit Test (JUnit?) for failure clause and a sample protobuf(s) to go with the clause
  • Wiki-entry detailing what is tested and why it's tested 
And a 
  • A common correctly formatted protobuf for testing the pass conditions of all the unit tests
Do we need anything else per test? 

Also, for the wiki page, how should it be structured? I thought of
1. Separating the warnings and errors as been done on the GTFS validator wiki page
2. Categorizing checks done for each  Element (in the same order as the specification page)
-I feel that this will give a clearer idea of the checks but there might be some redundancy (ex- Latitude/Longitude tests) and some of the checks would be out of place(ex- checks having to deal with two feed entity types)

Which one would be better? Are there any other approaches that will suite this better?  
Thank you. 

Best Regards,
Nipuna. 
 


Sean Barbeau

unread,
Mar 24, 2015, 5:54:48 PM3/24/15
to gtfs-r...@googlegroups.com
The validator wouldn't have to rely on any external tools then?

Right, just point it at GTFS and GTFS-rt data and watch it go.


Then, for each rule there must be, 
  • A pluggable method testing for an given known error 
  • Unit Test (JUnit?) for failure clause and a sample protobuf(s) to go with the clause
  • Wiki-entry detailing what is tested and why it's tested
Yes - I'd also add a unit test for success case (I'm now seeing you mentioned this later in your message), and example protobuf, for each rule. So:
  • A pluggable method testing for given known rule (these could be implemented as unit/integration tests themselves)
  • Unit Test (JUnit?) for success/failure case and a sample protobuf(s) to go with the case (or, sample implementation for websockets/incremental updates)
  • Wiki-entry detailing what is tested and why it's tested (code should also be well-commented)

Do we need anything else per test?

A way to log either warnings or errors or both would be useful, perhaps with the options to filter by specific warning/error.  If the tool runs over an extended time period, being able to filter out noise you're not interested in and focus on warnings or errors for a few specific rules would be useful.  Taking a close look at the command line options of the existing GTFS feed validator would likely be helpful guidance - https://github.com/google/transitfeed/wiki/FeedValidator#command-line-options.
 
Also, for the wiki page, how should it be structured? I thought of
1. Separating the warnings and errors as been done on the GTFS validator wiki page

Yes, I think something like https://github.com/google/transitfeed/wiki/FeedValidatorErrorsAndWarnings would work.  Assigning a specific ID to each warning/error seems helpful.


Categorizing checks done for each  Element (in the same order as the specification page)

I'd try to logically group them as closely to the spec as possible - some deviation would be ok, as long as there is general organization as mentioned above.

Nipuna Gunathillake

unread,
Mar 27, 2015, 2:44:31 PM3/27/15
to gtfs-r...@googlegroups.com
  • A pluggable method testing for given known rule (these could be implemented as unit/integration tests themselves)
I'm not entirely clear on how this would work. 
If it's going to be implemented as unit/integration test, would that mean using an existing framework?
Or something similar to those types of test implemented just for the application? 
  • Unit Test (JUnit?) for success/failure case and a sample protobuf(s) to go with the case (or, sample implementation for websockets/incremental updates)
 Got it. 
  • Wiki-entry detailing what is tested and why it's tested (code should also be well-commented)
Of course. 
Should they refer back to the error code(in the wiki) or be completely self explanatory? 
 

A way to log either warnings or errors or both would be useful, perhaps with the options to filter by specific warning/error.  If the tool runs over an extended time period, being able to filter out noise you're not interested in and focus on warnings or errors for a few specific rules would be useful.  Taking a close look at the command line options of the existing GTFS feed validator would likely be helpful guidance - https://github.com/google/transitfeed/wiki/FeedValidator#command-line-options.

How about using a web interface to display the errors while the validtor is running? 
Since this tool may run for days, in a console log filtered by options, the required data might be hidden.
A web page that updates in realtime would allow for easy filtering of data as well. 

Of course a good console log could be used along with a web interface as well.
 
Also, for the wiki page, how should it be structured? I thought of
1. Separating the warnings and errors as been done on the GTFS validator wiki page

Yes, I think something like https://github.com/google/transitfeed/wiki/FeedValidatorErrorsAndWarnings would work.  Assigning a specific ID to each warning/error seems helpful.

Categorizing checks done for each  Element (in the same order as the specification page)

I'd try to logically group them as closely to the spec as possible - some deviation would be ok, as long as there is general organization as mentioned above.

Got it. I'll start working on the wiki page soon.

I've submitted my GSoC proposal for this project. I understand that it may or may not be selected. 
But is it okay if I start working on this right away in my free time? 

Sean Barbeau

unread,
Mar 27, 2015, 5:18:14 PM3/27/15
to gtfs-r...@googlegroups.com
I'm not entirely clear on how this would work. 
If it's going to be implemented as unit/integration test, would that mean using an existing framework?
Or something similar to those types of test implemented just for the application?

Sorry, I wasn't very clear here.  At a high level I had something in mind where the logic for the rules are written with something like the Java "assert" statement:
http://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html

Generally, I was thinking it would be good to try and leverage an existing framework that is intended to evaluate pass/fail conditions, to make it clear where the rules are evaluated in the code and simply the required boilerplate code as much as possible.


Of course. 
Should they refer back to the error code(in the wiki) or be completely self explanatory?

Upon further thought its probably better to version the "wiki" documentation as well, which on Github would mean bundling a Readme and other markdown-encoded documentation *within* the repo, vs. manually adding a wiki page outside of a Github repo.  This way, if rules are changed or added, the same commit can reference the change to the code w/ comments and the change to the documentation.  It will be easier to keep both in sync, rather than relying on manually editing the wiki and trying to keep up with the code.


How about using a web interface to display the errors while the validtor is running? 
Since this tool may run for days, in a console log filtered by options, the required data might be hidden.
A web page that updates in realtime would allow for easy filtering of data as well. 

Of course a good console log could be used along with a web interface as well.

Yes, I think both will be useful, but I do like having a web interface showing the errors.  As referenced in the other thread earlier, this would also allow a 3rd party to host the validation tool, and agencies to plug in URLs to their GTFS/GTFS-rt data to a web page and then see validation output via the same page.


But is it okay if I start working on this right away in my free time?

Absolutely!  That would be awesome.  I was getting ready to setup a new Github repo for this, and I stumbled on an old OneBusAway project that is worth taking a good look at:
https://github.com/OneBusAway/onebusaway-gtfs-realtime-munin-plugin

It's a plugin for the Munin monitoring tool:
http://munin-monitoring.org/

...which I know very little about :).  But, seems like it has similar goals in mind.  Once catch is that onebusaway-gtfs-realtime-munin-plugin doesn't seem to consume any GTFS data currently - just real-time.  You might want to get this set up as-is and see what Munin can do and if it could be used for the GTFS-rt validator.  And, feel free to fork it on Github and make changes.

Sean

Nipuna Gunathillake

unread,
Mar 28, 2015, 4:17:33 PM3/28/15
to gtfs-r...@googlegroups.com
Sorry, I wasn't very clear here.  At a high level I had something in mind where the logic for the rules are written with something like the Java "assert" statement:
http://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html

Generally, I was thinking it would be good to try and leverage an existing framework that is intended to evaluate pass/fail conditions, to make it clear where the rules are evaluated in the code and simply the required boilerplate code as much as possible.

Got it, I'll take a look at the available frameworks that can be used. 
Something like JUnit would really make it easy to see where the tests are being plugged in and what they are supposed to do.
 

Of course. 
Should they refer back to the error code(in the wiki) or be completely self explanatory?

Upon further thought its probably better to version the "wiki" documentation as well, which on Github would mean bundling a Readme and other markdown-encoded documentation *within* the repo, vs. manually adding a wiki page outside of a Github repo.  This way, if rules are changed or added, the same commit can reference the change to the code w/ comments and the change to the documentation.  It will be easier to keep both in sync, rather than relying on manually editing the wiki and trying to keep up with the code.
 
RIght, rather than having a seperate versioning system for the docs like in the github wiki pages, we'll have the documents along with the source code in the same repo? 
The only possible problem with this approach that I can see is about collecting requirements before starting the development process(code commits will come after the initial commits of the documentation). 

How about using a web interface to display the errors while the validtor is running? 
Since this tool may run for days, in a console log filtered by options, the required data might be hidden.
A web page that updates in realtime would allow for easy filtering of data as well. 

Of course a good console log could be used along with a web interface as well.

Yes, I think both will be useful, but I do like having a web interface showing the errors.  As referenced in the other thread earlier, this would also allow a 3rd party to host the validation tool, and agencies to plug in URLs to their GTFS/GTFS-rt data to a web page and then see validation output via the same page.

Great! I'll try to make a small prototype of the web interfaces in the next week. 
That would show the input of the URLs and the outputs generated using mock data. 
 
But is it okay if I start working on this right away in my free time?

Absolutely!  That would be awesome.  I was getting ready to setup a new Github repo for this, and I stumbled on an old OneBusAway project that is worth taking a good look at:
https://github.com/OneBusAway/onebusaway-gtfs-realtime-munin-plugin

It's a plugin for the Munin monitoring tool:
http://munin-monitoring.org/

...which I know very little about :).  But, seems like it has similar goals in mind.  Once catch is that onebusaway-gtfs-realtime-munin-plugin doesn't seem to consume any GTFS data currently - just real-time.  You might want to get this set up as-is and see what Munin can do and if it could be used for the GTFS-rt validator.  And, feel free to fork it on Github and make changes.

I've spent a couple of hours today looking at the Munin plugin and the tool it self.  
I managed to get Munin installed and the plugin running for the most part (it's supposed to draw a vehicle count graph which I couldn't get it to draw.)

From what I understand, the plugins are pretty simple to write. (http://munin-monitoring.org/wiki/HowToWritePlugins)

What's needed to draw a time series graph is,
1. a simple executable (ex - gtfs-realtime) that can be run periodically returning values to be updated in the following format,
load.value 0.06
2. The config of the same command (ex - gtfs-realtime config) that returns the format of the graph and where the values should be inserted.
graph_title Load average
graph_vlabel load
load.label value
The onebusaway plugin was also just one .java file that counts the number of vehicles and trip updates. 

Using these data Muini will draw a simple time series graph that gets updated every 5 minute using cron job. 
This can be done using any programming language as long as it can be executed automatically(using a cron job) returning the properly formatted output. 

The plugins seems relatively easy to build, but there are some problems I feel that might limit the use of Munin as a base for the validation tool, 

1. The use of time-series graphs might now work for all validtion needs. Those charts are the only form of data representation I've seen as output.
That being said, we can run any amount of validation in the same code that generates the graph (The output would not be visible in the Munin interface, but can be used somewhere else). 

2. Setting up the tool might be a bit difficult. We have to get Munin up and running hen edit some configuration files (to give the URLs etc). This might be a bit daunting.

3. It seems to be mainly targeted for Unix-like systems. I'm not sure of the what Operating Systems we should be targeting , but Windows doesn't seems to be particularly well supported by Munin. 

4. It's very unlikely to get this to be a stand-alone application. The need to edit multiple config files also means that admin rights would be needed on the machine that runs the validator. 

But as I mentioned, I've only played around with Munin for a couple of hours today. So I might be wrong on some of those points. 

Best Regards,
Nipuna.

Sean Barbeau

unread,
Mar 30, 2015, 3:41:11 PM3/30/15
to gtfs-r...@googlegroups.com
RIght, rather than having a seperate versioning system for the docs like in the github wiki pages, we'll have the documents along with the source code in the same repo? 
The only possible problem with this approach that I can see is about collecting requirements before starting the development process(code commits will come after the initial commits of the documentation).

Yes.  It's fine to start the documentation in a wiki format and then transition it to documents in the same repo.  When code starts to materialize the wiki documentation can be transitioned to live in the repo with the code.


The plugins seems relatively easy to build, but there are some problems I feel that might limit the use of Munin as a base for the validation tool,

Agreed, based on your initial look it sounds like the validation tool would be better without the dependency on Munin.  An important part of the tool is a low learning curve and ease of execution, as well as good output appropriate for all our rules, and it sounds like Munin doesn't fit well for those items.

Sean

Nipuna Gunathillake

unread,
Apr 1, 2015, 10:18:19 AM4/1/15
to gtfs-r...@googlegroups.com
Yes.  It's fine to start the documentation in a wiki format and then transition it to documents in the same repo.  When code starts to materialize the wiki documentation can be transitioned to live in the repo with the code.
 
I've started gathering some of the rules(from the GSoC wiki page and the specs). I'll post them online this week. 
Hopefully we can get them verified/improved and get other rules for the validator. 
 
Agreed, based on your initial look it sounds like the validation tool would be better without the dependency on Munin.  An important part of the tool is a low learning curve and ease of execution, as well as good output appropriate for all our rules, and it sounds like Munin doesn't fit well for those items.

That's true.
The way Munin plugins work are interesting. That might be useful when making the rules that can be easily plugged in. 

I've created a small mock-up for the online part of the tool. I've currently uploaded it to Invisio.  
Anyone can comment on the prototype at the link. 
(turn on the comments in the bottom right side of the screen.)

Any feedback on those would be much appreciated.
Also, if it's difficult to provide feedback there I'll upload them somewhere else. 

Best Regards,
Nipuna 

Barbeau, Sean

unread,
Apr 1, 2015, 1:43:04 PM4/1/15
to gtfs-r...@googlegroups.com

From a quick look the mockup looks good to me so far.  Note that it may also be possible for an agency to offer a single GTFS-rt API endpoint which might contain multiple types of updates (Vehicles + Trips) – whether this is currently allowed by the GTFS-rt is something we still need to confirm with the community.  If so, easiest way to handle this would be to just add another line of text under the fields saying that if multiple update types are included in the feed, then the agency would just repeat the URL for each text box.  So if Vehicles + Trips were included in the same endpoint, the user would enter:

·         GTFS-realtime (Vehicles) – http://my.feed.com/realtime

·         GTFS-realtime (Alerts) – xxx

·         GTFS-realtime (Trip Updates) – http://my.feed.com/realtime

 

…and of course the underlying validator code would need to handle it accordingly.

 

Sean

--
You received this message because you are subscribed to a topic in the Google Groups "GTFS-realtime" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gtfs-realtime/GVZL77KvfXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gtfs-realtime/603fc530-7b00-4f80-8063-00ad79c1bec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nipuna Gunathillake

unread,
Apr 1, 2015, 3:45:54 PM4/1/15
to gtfs-r...@googlegroups.com


On Wednesday, April 1, 2015 at 11:13:04 PM UTC+5:30, Sean Barbeau wrote:

From a quick look the mockup looks good to me so far.  Note that it may also be possible for an agency to offer a single GTFS-rt API endpoint which might contain multiple types of updates (Vehicles + Trips) – whether this is currently allowed by the GTFS-rt is something we still need to confirm with the community.  If so, easiest way to handle this would be to just add another line of text under the fields saying that if multiple update types are included in the feed, then the agency would just repeat the URL for each text box.  So if Vehicles + Trips were included in the same endpoint, the user would enter:

·         GTFS-realtime (Vehicles) – http://my.feed.com/realtime

·         GTFS-realtime (Alerts) – xxx

·         GTFS-realtime (Trip Updates) – http://my.feed.com/realtime

 

…and of course the underlying validator code would need to handle it accordingly.


Thank you for the feedback. I'll make a way to combine the two fields if a check-box is clicked.
Is there anyway to get a confirmation on the possibility of joining the two feeds? (Shall I just keep following the previous thread?)

Any thoughts on the "Detailed log" page? What other filters should there be?

Best Regards,
Nipuna. 
Reply all
Reply to author
Forward
0 new messages