Also, I was thinking about setting up a small wiki page stating the rules and the best practices. Like it has been done for the GTFS validator (link).I can maintain a wiki page containing the rules that need to be validated as discussed here.
Thanks for posting here. I'd also encourage you to follow this thread in case there is further comment from Google on their interpretation of semantic cardinality of fields (i.e., what data elements should be populated by GTFS-rt producers):
https://groups.google.com/d/msg/gtfs-realtime/wm3W7QIEZ9Y/kBs5zq_VYO4J
Also to clarify scope, this GTFS-rt validator would be used in the context of any GTFS-rt feed, not just ones to be used with OTP. However, I do think its good to use OTP as a sample consumer when working on the validator, since it gives you some context of why certain fields should be mandatory, etc.
Also, I was thinking about setting up a small wiki page stating the rules and the best practices. Like it has been done for the GTFS validator (link).I can maintain a wiki page containing the rules that need to be validated as discussed here.
Agreed, I think this would be good.
Also, We would need to generate feeds to test the validator itself. Would the One Bus Away feed generators work for this purpose?
I'd set up unit tests within the validator itself with known pass and failure conditions, so we know each rule is working correctly. Easiest way to do this for polling implementation would be to simply bundle the encoded protobuf files for tests with the app - you could simply copy the output of the OneBusAway or other feed generators. For streaming websockets example we could set up included tests as well, see - https://github.com/OneBusAway/onebusaway-gtfs-realtime-exporter/tree/master/src/test/java/org/onebusaway/gtfs_realtime/exporter.
-I feel that this will give a clearer idea of the checks but there might be some redundancy (ex- Latitude/Longitude tests) and some of the checks would be out of place(ex- checks having to deal with two feed entity types)
The validator wouldn't have to rely on any external tools then?
Then, for each rule there must be,
- A pluggable method testing for an given known error
- Unit Test (JUnit?) for failure clause and a sample protobuf(s) to go with the clause
- Wiki-entry detailing what is tested and why it's tested
Do we need anything else per test?
Also, for the wiki page, how should it be structured? I thought of1. Separating the warnings and errors as been done on the GTFS validator wiki page
- A pluggable method testing for given known rule (these could be implemented as unit/integration tests themselves)
- Unit Test (JUnit?) for success/failure case and a sample protobuf(s) to go with the case (or, sample implementation for websockets/incremental updates)
- Wiki-entry detailing what is tested and why it's tested (code should also be well-commented)
A way to log either warnings or errors or both would be useful, perhaps with the options to filter by specific warning/error. If the tool runs over an extended time period, being able to filter out noise you're not interested in and focus on warnings or errors for a few specific rules would be useful. Taking a close look at the command line options of the existing GTFS feed validator would likely be helpful guidance - https://github.com/google/transitfeed/wiki/FeedValidator#command-line-options.
Also, for the wiki page, how should it be structured? I thought of1. Separating the warnings and errors as been done on the GTFS validator wiki page
Yes, I think something like https://github.com/google/transitfeed/wiki/FeedValidatorErrorsAndWarnings would work. Assigning a specific ID to each warning/error seems helpful.Categorizing checks done for each Element (in the same order as the specification page)
I'd try to logically group them as closely to the spec as possible - some deviation would be ok, as long as there is general organization as mentioned above.
I'm not entirely clear on how this would work.If it's going to be implemented as unit/integration test, would that mean using an existing framework?Or something similar to those types of test implemented just for the application?
Of course.Should they refer back to the error code(in the wiki) or be completely self explanatory?
How about using a web interface to display the errors while the validtor is running?Since this tool may run for days, in a console log filtered by options, the required data might be hidden.A web page that updates in realtime would allow for easy filtering of data as well.Of course a good console log could be used along with a web interface as well.
But is it okay if I start working on this right away in my free time?
Sorry, I wasn't very clear here. At a high level I had something in mind where the logic for the rules are written with something like the Java "assert" statement:
http://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html
Generally, I was thinking it would be good to try and leverage an existing framework that is intended to evaluate pass/fail conditions, to make it clear where the rules are evaluated in the code and simply the required boilerplate code as much as possible.
Of course.Should they refer back to the error code(in the wiki) or be completely self explanatory?
Upon further thought its probably better to version the "wiki" documentation as well, which on Github would mean bundling a Readme and other markdown-encoded documentation *within* the repo, vs. manually adding a wiki page outside of a Github repo. This way, if rules are changed or added, the same commit can reference the change to the code w/ comments and the change to the documentation. It will be easier to keep both in sync, rather than relying on manually editing the wiki and trying to keep up with the code.
How about using a web interface to display the errors while the validtor is running?Since this tool may run for days, in a console log filtered by options, the required data might be hidden.A web page that updates in realtime would allow for easy filtering of data as well.Of course a good console log could be used along with a web interface as well.
Yes, I think both will be useful, but I do like having a web interface showing the errors. As referenced in the other thread earlier, this would also allow a 3rd party to host the validation tool, and agencies to plug in URLs to their GTFS/GTFS-rt data to a web page and then see validation output via the same page.
But is it okay if I start working on this right away in my free time?
Absolutely! That would be awesome. I was getting ready to setup a new Github repo for this, and I stumbled on an old OneBusAway project that is worth taking a good look at:
https://github.com/OneBusAway/onebusaway-gtfs-realtime-munin-plugin
It's a plugin for the Munin monitoring tool:
http://munin-monitoring.org/
...which I know very little about :). But, seems like it has similar goals in mind. Once catch is that onebusaway-gtfs-realtime-munin-plugin doesn't seem to consume any GTFS data currently - just real-time. You might want to get this set up as-is and see what Munin can do and if it could be used for the GTFS-rt validator. And, feel free to fork it on Github and make changes.
load.value 0.06
graph_title Load average graph_vlabel load load.label value
RIght, rather than having a seperate versioning system for the docs like in the github wiki pages, we'll have the documents along with the source code in the same repo?The only possible problem with this approach that I can see is about collecting requirements before starting the development process(code commits will come after the initial commits of the documentation).
The plugins seems relatively easy to build, but there are some problems I feel that might limit the use of Munin as a base for the validation tool,
Yes. It's fine to start the documentation in a wiki format and then transition it to documents in the same repo. When code starts to materialize the wiki documentation can be transitioned to live in the repo with the code.
Agreed, based on your initial look it sounds like the validation tool would be better without the dependency on Munin. An important part of the tool is a low learning curve and ease of execution, as well as good output appropriate for all our rules, and it sounds like Munin doesn't fit well for those items.
From a quick look the mockup looks good to me so far. Note that it may also be possible for an agency to offer a single GTFS-rt API endpoint which might contain multiple types of updates (Vehicles + Trips) – whether this is currently allowed by the GTFS-rt is something we still need to confirm with the community. If so, easiest way to handle this would be to just add another line of text under the fields saying that if multiple update types are included in the feed, then the agency would just repeat the URL for each text box. So if Vehicles + Trips were included in the same endpoint, the user would enter:
· GTFS-realtime (Vehicles) – http://my.feed.com/realtime
· GTFS-realtime (Alerts) – xxx
· GTFS-realtime (Trip Updates) – http://my.feed.com/realtime
…and of course the underlying validator code would need to handle it accordingly.
Sean
--
You received this message because you are subscribed to a topic in the Google Groups "GTFS-realtime" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/gtfs-realtime/GVZL77KvfXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
gtfs-realtim...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/gtfs-realtime/603fc530-7b00-4f80-8063-00ad79c1bec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
From a quick look the mockup looks good to me so far. Note that it may also be possible for an agency to offer a single GTFS-rt API endpoint which might contain multiple types of updates (Vehicles + Trips) – whether this is currently allowed by the GTFS-rt is something we still need to confirm with the community. If so, easiest way to handle this would be to just add another line of text under the fields saying that if multiple update types are included in the feed, then the agency would just repeat the URL for each text box. So if Vehicles + Trips were included in the same endpoint, the user would enter:
· GTFS-realtime (Vehicles) – http://my.feed.com/realtime
· GTFS-realtime (Alerts) – xxx
· GTFS-realtime (Trip Updates) – http://my.feed.com/realtime
…and of course the underlying validator code would need to handle it accordingly.