Imagine a journey that goes from stop A in zone1 to stop B in zone2 on
route1, then continues to stop C in zone2 on route1, then transfers to
route2 and goes to stop D in zone2.
If there are fare_rules as follows:
fare_id,route_id,contains_id
fare1,route1,zone1
fare1,route2,zone1
fare1,route2,zone2
The trip passes through zones 1 and 2. It uses routes 1 and 2. But it
doesn't use route2 in zone1 -- instead, it uses route1 in zone2. Does
fare1 apply to the trip?
Or does this set of rules describe two different situations -- one
journey on route2 passing through zone1 and zone2, and another on route1
passing through only zone2?
In general, the documentation is not very clear on which rules should be
joined by "and" by "or". This is really a consequence of trying to
squash a very complex set of rules down into csv. I've been thinking
for hours about how to avoid building a complex algebra of rules. The
state machine model I proposed in the previous email is complexity
enough.
Is it possible to clarify when fare rules apply? Or does the whole
thing need to be rethought?
I would also appreciate more discussion and clarification on
fare_rules.
Does anyone publish a feed where they specify route_id and a
destination_id and origin_id for one record in fare_rules? What sort
of behaviors does this produce in client applications?
Will, for example, specifying a route_id in a record that also
specifies an origin_id and destination_id narrow the number of trips
this will apply to? (I think this is the case.)
FareExamples (http://code.google.com/p/googletransitdatafeed/wiki/
FareExamples) is silent on cases where these kinds of mixed
designations are made.
Aaron
I intend to design an understandable user interface for creating and managing a fare schedule, and need to answer questions about user needs, and the GTFS before I can do this.
While, I've generally not found need to, and avoided, mixing approaches toward fare rule designation (meaning: rules using contains_id, origin_id/destination_id, with route_id applying for a single fare attribute) in the feeds I publish, I ask:
1.) How does mixing these approaches work in GTFS?
2.) Are there agencies out there that need to, and are, doing this in their published feeds?
For the sake of discussion, I'll offer my own interpretation of the GTFS for fares with this one example:
FARES / fare_attributes.txt
fare_id,price,currency_type,payment_method,transfers,transfer_duration
local_fare1,1.75,USD,0,,7200
local_fare2,2.15,USD,0,,7200
express_fare1,5.00,USD,0,,7200
SCENARIO 1 / fare_rules.txt
fare_id,route_id,origin_id,destination_id
local_fare1,local_route1,zone1,zone2
local_fare1,local_route2,zone1,zone2
local_fare2,local_route1,zone1,zone3
local_fare2,local_route2,zone1,zone3
express_fare1,express_route,,
express_fare1,local_route1,,
express_fare1,local_route2,,
local_fare1 will apply to any trip from zone1 to zone2 that uses any combination of local_route1 and/or local_route 2. express_fare1 will apply for any trip that utilizes the express_route between any zones and allows transfers to lesser-priced local service. Correct?
Yes, the spec leaves many questions open.
I would like to share with the group how Google interprets
fare_attributes and fare_rules (glossing over some legacy issues),
hoping that this can serve as the basis for a better definition of
what these tables are supposed to mean. It is quite possible that
limitations of the current way of specifying fares show up in the
process, but I would like to first clean up the confusion we have now
and only afterwards start to discuss future improvements.
To keep things simple, I'm only talking about feeds with a single
agency in this message.
I need to begin with defining some terminology.
The result of a trip planner, i.e., that thing for which we want to
compute a fare, I call a _journey_. Along the journey, the passenger
uses pieces of GTFS trips, and I call these pieces _rides_. (Let me
know if there is a better word.) Each ride is in turn a sequence of
_zones_, namely the fare_zones of the GTFS stops it visits, including
the boarding and alighting stop of the ride.
A _fare_ is the thing identified by a fare_id. Each fare has:
- the data from the one row in fare_attributes.txt with its fare_id, namely:
-- a price in a specific currency and a payment method
-- a maximum number of transfers (infinite if not specified)
-- a maximum duration (from the transfer_duration column, infinite
if not specified)
- the data from all the rows in fare_rules.txt with its fare_id, namely:
-- a set of pairs of (origin, destination) zones, possibly empty.
-- a set of contained zones, possibly empty.
-- a set of routes, possibly empty.
In these abstract terms, this makes sense (I hope), but the relation
to CSV syntax is a bit surprising: It doesn't matter how you
distribute
- the pairs of (origin, destination) zones
- the contained zones
- the routes
over all the rows in the CSV table that have the same fare_id! The
elements of those three sets will be collected independently from the
rows of same fare_id, with no regard to how they are combined on
individual rows. However, an origin and a destination are only paired
for occuring in the same row.
After this syntactic detour, now back to what we do with these fares.
A fare may or may not be _applicable_ to a sequence of rides (i.e., a
contiguous subsequence of a journey). It is applicable if all of the
following conditions are satisfied.
- The fare's maximum number of transfers is infinite OR is no less
than the number of rides minus one.
- The fare's maximum duration is infinite OR no less than the time
span between the first ride's start and the last ride's end.
- The fare's set of (origin, destination) pairs is empty OR contains
the pair with the first ride's start zone and the last ride's end
zone.
- The fare's set of contained zones is empty OR is *equal* to the set
of all zones that appear in the sequence of rides.
- The fare's set of routes is empty OR *contains* the route of each ride.
Computing the price of a journey means to find
1) a decomposition of the journey into contiguous subsequences (i.e.,
sequences of rides)
and
2) for each subsequence, an applicable fare.
We do that with the following simple algorithm, starting from the
whole journey and iterating on what is left of it after a fare has
been computed for initial pieces:
1) Find the largest prefix (i.e., contiguous subsequence of rides) for
which there is any applicable fare.
2) Among the applicable fares to that prefix, choose the cheapest.
3) Remove that prefix. If there is a (non-empty) rest, repeat this
procedure on the rest.
4) Output the sum of the prices of all the chosen fares.
This algorithm is "greedy" in that never gives back a piece of the
journey that it has already bitten off, even if that means it fails to
find a fare for the rest, or fails to find the overall cheapest price.
Going forward, I believe the spec should be clear on what a fare is
(in the abstract sense above), how it is read from the CSV tables, and
when it is applicable to a sequence of rides. I am looking for
feedback whether or not what I wrote above is sufficiently clear and
simple to understand. The current state of GTFS, I believe, reflects
the best intentions to keep things easy to understand for feed
providers and therefore to avoid the formality involved in explaining
these tables in full generality, but I agree that it offers less
guidance than it shoud to developers of feed-consuming programs.
Arno Eigenwillig
Google
--
Google Switzerland GmbH
A component in an (origin, destination) pair may be left empty to
denote "any zone"; this allows fares to be restricted by origin or
destination zone only.
> After this syntactic detour, now back to what we do with these fares.
>
> A fare may or may not be _applicable_ to a sequence of rides (i.e., a
> contiguous subsequence of a journey). It is applicable if all of the
> following conditions are satisfied.
> - The fare's maximum number of transfers is infinite OR is no less
> than the number of rides minus one.
> - The fare's maximum duration is infinite OR no less than the time
> span between the first ride's start and the last ride's end.
> - The fare's set of (origin, destination) pairs is empty OR contains
> the pair with the first ride's start zone and the last ride's end
> zone.
This should read: ... OR contains
a pair whose first component is empty or the first ride's start zone
and whose second component is empty or the last ride's end zone.
So does this mean you produce incorrect results for Portland, Oregon?
Portland's transfers are only evaluated at boarding time. I suspect
this is actually true most places -- it certainly is in New York.
Also, it doesn't appear to match your actual results. Consider this
unusually long trip:
http://maps.google.com/maps?f=d&source=s_d&saddr=SW+Farmington+Rd%
2FOR-10+E&daddr=SE+Kelso+Rd&hl=en&geocode=FUB2tQIdmDas-A%
3BFTgGtQId3pe0-A&mra=ls&dirflg=r&ttype=dep&date=03%2F25%
2F10&time=6:12pm&noexp=0&noal=0&sort=&sll=45.423034,-122.697372&sspn=0.338798,0.568542&ie=UTF8&ll=45.578484,-122.56485&spn=0.675725,1.137085&z=10&start=0
> I am looking for feedback whether or not what I wrote above is
> sufficiently clear and simple to understand.
Yes, very clear.
Interesting... SF Muni restricts transfer validity by the time of
alighting at the end, and Google interprets the transfer_duration
column in that way. At first sight, this would suggest we need two
columns: One to restrict the last boarding time for which a fare is
good, and another to limit the final departure time.
> Also, it doesn't appear to match your actual results. Consider this
> unusually long trip:
Thank you for this instructive example.
If it makes anyone at Google feel better, the trip planner on
trimet.org doesn't calculate the fare correctly either. :-)
On Mar 26, 2:11 am, Arno Eigenwillig <arnoegw.c...@gmail.com> wrote:
I'm working to make OTP calculate trimet fares correctly, and I've run
into what I think might be a data bug. I'm posting it in this thread
because it could help us clarify our understanding of the proposed
interpretation.
I'm looking at:
http://developer.trimet.org/schedule/GTFS/20100328/google_transit.zip
Imagine a trip that takes the #43 bus (route id 43), and then the WES
(route id 203). I believe that (assuming we transfer within an hour)
this should require one all-zones fare, if I read this page correctly:
http://trimet.org/fares/transfers.htm
However, the feed does not have a single fare which includes both routes
203 and 43. In fact, route 203 does not share its fare id (58) with any
other routes.
A cursory look shows that the MAX lines seem to have a similar problem
-- all of their fare ids are odd or zero, while all of the buses' fare
ids are even, making it impossible to transfer from a bus to a MAX.
I would anticipate that a simple flag toggling between proof-of-
payment (i.e. transfer/fare must be valid for entire itinerary, up
until passenger leaves system/exits last vehicle) and pay on entry
(transfer/fare is only validated upon entering the last vehicle used
in an itinerary, but there is no [tacitly] enforced limit on how long
that passenger can continue riding that last vehicle having boarded
and now possessing an expired transfer/fare)
I would further assume that each fare type classified in the feed
would need the ability to be flagged independently... where systems
may have proof of payment for some vehicles/modes, but pay on entry
for other services (subway versus bus).
Tim Sobota, Transit Planner
Metro Transit, City of Madison
On Mar 26, 4:11 am, Arno Eigenwillig <arnoegw.c...@gmail.com> wrote:
To clarify, I don't think it would be possible for route 203 (WES
Commuter Rail) & bus route 43 to share a fare_id given the current
specification. Each fare has a different payment_method in
fare_attributes.txt & bus route 43 is a zoned based fare, where as,
WES is a fixed fare.
Here is my attempt to describe the fare structure at TriMet, as it
relates to our GTFS feed:
- Bus, MAX light rail & Portland streetcar have zone based fares
- Portland Aerial Tram & WES Commuter Rail are fixed fares
- Each fare_id in fare_attributes.txt contains every zone combination
for the zone based modes, as well as, 2 fares for the fixed fare modes
- I also duplicate the fare_attributes for bus & rail because each
have a different payment_method
Here is some crude pseudo code of how you could use the fare files as
they exist today to calculate an accurate fare on TriMet:
for each leg in trip:
leg_fare = cost of leg according to fare_attributes.txt,
fare_rules.txt
if transfer from previous fare is possible:
leg_fare = greater of leg_fare & previous leg_fare
else:
trip_fare += leg_fare
Hope this helps,
-Mike Gilligan
TriMet
The payment_method should probably be read as applying to the first ride
on a journey.
So, if you put all of the WES stops in their own zone (call it W), you
could make a set of fare rules containing (as in contains_id) each
combination of zones, starting (as in origin_id) in W, and give it a
payment_method of [whatever WES's payment_method is]. Then you could
create another set of fare_rules starting on MAX and containing W, and
another starting on the bus and containing W.
This is necessary because when you buy a full fare on the WES, it comes
with (optionally) a transfer to the bus or MAX, and visa-versa, assuming
you buy an all-zones fare. At least, if I understand the TriMet website
right.
> Here is some crude pseudo code of how you could use the fare files as
> they exist today to calculate an accurate fare on TriMet:
>
> for each leg in trip:
> leg_fare = cost of leg according to fare_attributes.txt,
> fare_rules.txt
> if transfer from previous fare is possible:
> leg_fare = greater of leg_fare & previous leg_fare
> else:
> trip_fare += leg_fare
This doesn't tell you how to determine if a transfer from the previous
fare is possible. By possible, do you mean within the transfer_time and
number of transfers? If so, this seems strictly less powerful than
Arno's proposal, which allows multiple sets of nontransferrable fares
within a feed, as would be needed to represent many system with commuter
rail: Philadelphia (Regional Rail to bus), New York (subway to LIRR), or
I think Boston (subway to commuter rail).
I proposed an algorithm along your lines a while back to deal with New
York's peculiar fare structure, but it had a lot more complexity around
"if transfer from previous fare is possible". But I don't think we need
to go there for Portland. I think Portland's fare structure is simple
enough that Arno Eigenwillig's algorithm could handle it if it were ust
encoded a little differently.