The exceptions listed are the baseball and/or football schedules. In our current scheduling system, there is no way to determine if a trip has changed a time or stop, so to be conservative, the whole thing is added. I would also point out that most, if not all, Hudson line expresses from Croton Harmon have Yankees-East 153rd Street added as a stop so there are more than just trips being added. This is the way we will be adding special and one day schedules going forward. This would include holidays and getaways as well.
In our next iteration of scheduling software, this should not be an issue.
You can choose to do what you want with the data set, but if you eliminate a trip of the same name that has added/removed stops or a time change, you run the risk of eliminating legitimate trips.
For sure there should be a weekday 1810 and we will investigate this. I am sure this is related to bussing. However, from the above, 1810 will still be added for special and one day schedules.
I will say this, we of all developers, know about the large duplication that occurs with special schedules and one day schedules. We took great care to not provide a Monday, Tuesday through Thursday, Friday, Saturday and Sunday base schedules. The same logic for removing duplicated trips cannot be simply added to the one off schedules and have to compare stops and time differences for reach train in the schedule. We know the what and how, it is the when that is the question. The information is not wrong, does not break trip logic and conforms to the GTFS standard.
If this poses a systemic problem to apps, well the risk gets elevated. So if it does cause problems delivering accurate information in a timely, please let me know. I can use that....
Thanks for the response, John,
Before my algorithm removes anything, it does compare all stops and times because you are correct, if a trip changes in its stops or times then it truly is not the same trip. So those cannot be removed.
And yes, the trip information is not wrong and does conform. It does add a burden to low memory and storage constrained devices, so that is why I am cleaning up the data. When I come up with a programmatic solution that provides the same data in a more condensed format, taking into all of the considerations that you have listed, would it be useful to provide the application back to you for possible use? This will probably be done in Java but possibly in Perl or PHP.
Most definitely, we do have some developed code but any thing to push it along is helpful. Pst it to me if need be but need the justification from the post more so to push it so thanks for the posts!
Still looking at the Danbury stuff and may repost tomorrow latest Monday.
Looking at the New Haven Weekend Yankee PDF, I see trip 7533 which is the first trip for a 1:05 weekend game. If I look in the data, I see:
MacbookPro:google_transit brett$ grep 7533 trips.txt
3,2577,3059567,New Haven - Yankees-E153 St.,7533,1,,6,1
MacbookPro:google_transit brett$
So this trip has the 2577 service Id. Now if I look at:
MacbookPro:google_transit brett$ grep 2577 calendar.txt
2577,0,0,0,0,0,0,0,20110403,20111007
and
MacbookPro:google_transit brett$ grep 2577 calendar_dates.txt
2577,20110403,1
2577,20110501,1
2577,20110522,1
2577,20110612,1
MacbookPro:google_transit brett$
So my understanding is that trip 7533 only runs on 4/3, 5/1, 5/22, and 6/12. But I look at the actual Yankee's schedule and see home games on 4/16 and 6/11 (Saturday's) that are at 1:05pm but do not seem to have references in the data. Is this correct? It does not seem to match the paper where there is no reference to exceptions for Saturday's for example.
I've been trying to solve this problem since 2008, but right now I
feel like a lot of us are all having similar issues and need to have a
working meeting to try to streamline the process.
There is no doubt in my mind that if we were to spend any amount of
time doing this, it would save John's team and GTFS consumers like
myself a lot of anxiety, frustration, and confusion moving forward.
Chris
Different schedules for weekday day and evenings, weekend day and evenings.
I am tooling with this on the weekends now seeing the interest in it ;)
I think Chris has a good idea and it could be very beneficial to both sides if we have a well defined agenda, and clearly defined expectations, before meeting; what's in-scope, out-of-scope, take-aways, etc.
Part of the struggle, IMHO, sometimes stems from not having a good understanding as to "why" data is tagged the way it is; and I realize it may depend heavily on legacy processes and current limitations - not looking to point fingers, just a better understanding. The point is, if as developers we have a better understanding and/or comfort that the data format will remain consistent AND we better understand the decisions that determine the data set, then we should be able to reduce the time it takes to get new sets implemented.
We definitely want to contribute in streamlining the process.
Other thoughts?