realtime updater design and configuration

308 views

Skip to first unread message

Andrew Byrd

unread,

Aug 15, 2013, 12:10:50 PM8/15/13

to opentripplanner-dev

Hi again everyone,

On Tuesday Jaap, Jorden, and I had a discussion about applying real-time
updates and configuring realtime graph updater modules. This discussion
is summarized below. Laurent, the subject is very close to your recent
commits. Some of the questions that arise are best answered by you, and
we would be interested in any feedback you could provide. Jaap is ready
to implement the proposals below as part of his realtime update merge.

Realtime updater design
=======================

First, recall that our concurrency control model depends on having
multiple reader threads but a single graph writer thread. As discussed
before, we need to make all write operations non-overlapping and
sequential. This is currently achieved with a ScheduledExecutorService
which automatically re-schedules the same Runnables at regular
intervals, ensuring that they are executed sequentially by a single thread.

We need to allow both polling and event-driven (message-driven)
updaters. A concrete example of the polling type is a periodic HTTP GET
of a full GTFS-RT data set, and a concrete example of the event-driven
type is applying small GTFS-RT differential messages received over a
persistent Websocket. We first considered having two different subtypes
of GraphUpdater, one for polling and one for event-driven mode. In this
model, polling GraphUpdaters would be a single Runnable triggered at
intervals by a ScheduledExecutorService, as is now done in
PeriodicTimerGraphUpdater. On the other hand, event-driven GraphUpdaters
would be free-running threads that hold a reference to the same global
ScheduledExecutorService; upon receiving a message they would insert a
single non-repeating Runnable into the Executor queue.

Besides the added complexity of having two models for interaction
between the Executor and the GraphUpdaters, this strategy has another
problem. Graph updater tasks need to do some parsing/decoding and
preparation of the incoming data, open network connections, handshake
with servers, etc. We do not want these tasks to stall the lone graph
update thread and postpone execution of other queued GraphUpdater
Runnables (a sluggish bike rental fetch could prevent people from seeing
up-to-date arrival time predictions). Therefore, we propose to give each
GraphUpdater (both polling and event driven types) its own free-running
thread which handles all server interaction and inserts one-off (not
periodically scheduled) Runnable tasks into a shared, single-threaded
Executor/ExecutorService. The Runnables inserted in that Executor would
perform only the minimum amount of work necessary to respect the version
control model, and would possess references to the pre-fetched,
pre-processed updates that they will apply.

This means that the periodic update triggering logic would be
implemented by free-running GraphUpdater threads rather than a single
ScheduledUpdaterService (though a GraphUpdater could internally use a
ScheduledUpdaterService for this purpose). Periodic fetch is a common
need and should be implemented in an abstract base class shared by all
GraphUpdaters of the polling type. On the other hand, an even-driven
GraphUpdater would open a persistent connection to a server in its own
thread, registering a "message received" callback that decodes a single
message and inserts a Runnable to handle it in the shared queue. This
logic would probably be quite specific to the protocol and client
library in use.

We have decided to focus on GTFS-RT as a message format, since it is now
available for the Netherlands and we wish to promote its use as part of
the open GTFS standard. Polling and message-driven updaters should of
course re-use the same GTFS decoding and applying logic. Currently
GtfsRealtimeAbstractUpdateStreamer.getUpdates() calls getFeedMessage().
The code is employed by subclassing and overriding getFeedMessage, and
is designed for blocking / pulling operations. (The word "streamer" in
the name is somewhat misleading, and should be changed.) We propose that
this code be refactored as a library usable in different situations. The
preferred location would be as a static method on OTP TripUpdate, i.e.
public static List<TripUpdate> decodeFromGTFSRT(GtfsRealtime.FeedMessage
message) and/or public static List<TripUpdate> decodeFromGTFSRT(byte[]
bytes). The OTP class TripUpdate should perhaps itself be renamed,
because its name is identical to that of a class in the compiled GTFSRT
protobuf.

We also need to split out and reuse the logic that applies the
TripUpdates to a Graph, which is currently found in
StoptimeUpdater.run(). Again, this might be a static method on OTP's
TripUpdate. That code can then be included in / called by a Runnable
class, each instance of which holds a reference to a List of TripUpdates
and is placed in the shared Executor queue.

Ideally we will eventually merge Alert and TripUpdate handling. Both are
subtypes of GTFS-RT and can be parsed by the same methods. We will need
to ensure that multiple GraphUpdaters using the GTFS-RT decoder and
applier logic (polling Alerts and streaming TripUpdates) can function at
once, which should be no problem in this model.

PeriodicTimerGraphUpdater should be renamed as it will no longer
necessarily be periodic. It would contain a list of all GraphUpdaters
and their threads, a reference to the single shared Executor, and
perhaps other supporting data. It will have to contain logic for
managing and shutting down the updater threads. It probably deserves its
own field in the Graph rather than being a service in the String-keyed
map. Below we will refer to this new class as a GraphUpdaterManager.

GraphUpdaterRunnables currently have a run method which is wrapped to
create anonymous Runnable classes, which are then handed to the
Executor. GraphUpdaters and tasks to be placed in the global Executor
queue could instead just extend Runnable directly.

We also considered placing lists of internal update messages into a
single shared, threadsafe queue rather than placing arbitrary Runnables
in a shared Executor. However, this strategy requires a message type for
every possible graph modification operation and limits the flexibility
of GraphUpdaters. Since runnables can do just about anything, we need
well-documented and enforced conventions, perhaps materialized as an
abstract base class, to ensure that GraphUpdaters work as expected.

PreferencesConfigurable
=======================

We also want to configure these GraphUpdaters via Preferences files
placed in the Graph's directory. Laurent has implemented this. We have a
few questions about that implementation and suggestions for updating it.
GraphRuntimeConfigurator currently goes through a "bean"-like process of
instantiating graph updaters via reflection, then configuring them. We
would prefer to improve encapsulation and rely less on functions with
side effects. We are also facing a potential tangle of new classes and
interfaces. Questions:

1. Could we get rid of the PreferencesConfigurable interface and just
merge it into GraphUpdater? Is there anything we need to configure with
preferences that is not in some sense a graph updater thread?

2. Do the configurators need to be separate from the configured? Can we
for example merge BikeRentalConfigurator into BikeRentalUpdater2 since
BikeRentalConfigurator always makes an instance of this class?

Here is some example skeleton code (not compiled or tested):

public interface GraphUpdater extends Runnable {
public GraphUpdater fromPreferences(Preferences p, Graph g);
}

public class ExampleGraphUpdater implements GraphUpdater () {
Graph graph;
/** Factory method that produces a concrete instance */
@Override
public GraphUpdater fromPreferences(Preferences p, Graph g) {
this.graph = g;
//...
}
/** Run "forever", polling or handling message-driven callbacks */
@Override
public void run() {
while (not interrupted) {
sleep();
message = decode(poll(url));
graph.getGraphUpdaterManager().execute(
new GraphUpdateTask(message));
}
}
}

Instead of using reflection to instantiate a type, I would even prefer a
hard-coded switch enumerating all the options:

public class GraphUpdaterConfigurator {
// what is currently known as GraphRuntimeConfigurator
public void setupGraph(Graph graph, Preferences prefs) {
GraphUpdaterManager gum = new GraphUpdaterManager(graph);
// ...
for (String configurableName : config.childrenNames()) {
Preferences prefs = config.node(configurableName);
String type = prefs.get("type", null);
GraphUpdater gu = null;
if (type.equals("bike-rental")) {
gu = BikeRentalUpdater2.fromPreferences(prefs, graph);
} else if type.equals("gtfs-rt")) {
gu = GTFSRTGraphUpdater.fromPreferences(prefs, graph);
}
gum.add(gu);
}
if ( ! gum.isEmpty()) {
graph.setGraphUpdateManager(gum);
gum.startGraphUpdaters();
}
}
}

Finally, can the EmbeddedConfigService be eliminated and replaced with a
field on the Graph itself (Properties embeddedPreferences = ...)? Do we
really need embeddedConfig or can we just serialize a
GraphUpdaterManager and GraphUpdaters along with the Graph?

Thanks to anyone who made it all the way through that post!

-Andrew

Aaron Bannert

unread,

Aug 15, 2013, 2:01:31 PM8/15/13

to opentripplanner-dev

I'm excited to see progress on realtime updates for OTP, this is great stuff!

I have some suggestions though that might help simplify this whole design. Since implementing a pull-based approach inside a long-running OTP process is going to be fraught with complexities and problems, why not make all updates go through a push-style interface instead? That way you can avoid problems like having to trust DNS not to lie to you, dealing with temporary outages (how do notify the OTP operator that it's failing) or implementing all sorts of edge case handling inside of OTP. Instead, just post GTFS-RT messages straight to a servlet running inside OTP that interprets the results and calls a GraphUpdater interface.

In other words, I suggest this much simpler architecture:

1) Create a simple Java interface for changing the currently-loaded graph. Implement this however you like as you described below, just make it thread-safe.
2) Create a servlet for handling POSTed GTFS-RT messages which calls the interface in #1.
3) If you want a polling-style interface, write an external program that does a GET from the GTFS-RT sources, then POSTs to the servlet in #2. Run it under cron or wrap it in a script to repeat. Make it smart about caching-related headers (if-modified-since, cache-control, etag, etc…).
4) If you want OTP security, use your servlet container's capabilities (ie. https, basic auth, client SSL cert verification, etc…). If you want client security, implement it inside the external fetching program, and outside the critical sections of the graph updater.

Pros:
- more scalable (moves some processing outside OTP and allows for scaling that to multiple other boxes)
- more efficient (no long-lived, lingering HTTP client threads inside the servlet container)
- simplified OTP design
- allows more flexibility for operators (they can monitor the external scripts to make sure everything is working)
- allows multiple update streams (ie. from multiple agencies)
- extendable for other input types (someone can write a custom handler for their own realtime data feeds)
- avoids some security pitfalls, allows for other security capabilities

Cons:
- some programs will have to run outside the servlet container
- still have to make the GraphUpdater thread-safe and very efficient, to avoid performance problems (ie. if there are multiple streams coming in multiple times per minute each)

Thoughts?

-aaron

> --
> You received this message because you are subscribed to the Google Groups "OpenTripPlanner Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opentripplanner...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Andrew Byrd

unread,

Aug 16, 2013, 4:26:43 AM8/16/13

to opentripp...@googlegroups.com

Hi Aaron,

Thanks for your interest and your suggestions. Once the GTFS-RT parsing
and applying code is made more general, it would indeed be possible to
add a secured REST resource class that accepts pushed GTFS-RT messages.
However, I still think we benefit from a message-based protocol over a
persistent connection.

The reason for using minimally framed, message-based protocols is the
quantity and frequency of messages we must handle. In the Netherlands we
receive a minimum of one message from each active vehicle per minute,
and these are not aggregated. At the moment we are receiving between 20
and 25 individual GTFS-RT messages per second. Wrapping each of them in
an HTTP request would increase network traffic and server load.

One basic objective of this project was to avoid introducing
aggregation, latency or jitter in the message stream before consumption.
We would like to present time-sensitive information to users with
near-zero lag, and the time for an update to travel from the vehicle to
OTP is now about 2 seconds. We would also like to ensure message
ordering, which will be harder if received by multiple threads.

I expect more systems of this type to appear, since I know other
European cities are considering incremental message passing systems
drawing on the Dutch experience.

Of course your design has its own merits, and is informed by your
experience providing security and stability. Can you explain why a
long-lived HTTP client thread inside OTP would be problematic (as
opposed to one in an external process)? Why is it necessarily more
complex to handle temporary outages (informing operators), edge cases
etc. inside OTP than in an external process?

On the subject of scalability, it is not clear to me how receiving the
messages in a separate process (even on a separate machine) and then
re-posting them to the OTP server would be more efficient. I suspect
that the HTTP processing and network use would be worse. In any case,
I'm not too worried about needing to scale this task horizontally: the
load on my laptop from handling those 25 messages per second is barely
measurable and network bandwidth consumed is about 10kB/sec. This
subject has generated a lot of discussion, but the final implementation
will not be particularly complex.

Again, the determining factor here is the number of messages and their
time-sensitivity. In other scenarios (infrequent polling) your solution
makes a lot of sense, and would certainly be straightforward to
implement. I would expect it to reuse a lot of code, and involve only
one simple REST resource class on the OTP side.

-Andrew

On 08/15/2013 08:01 PM, Aaron Bannert wrote:
> I have some suggestions though that might help simplify this whole
> design. Since implementing a pull-based approach inside a
> long-running OTP process is going to be fraught with complexities and
> problems, why not make all updates go through a push-style interface
> instead? That way you can avoid problems like having to trust DNS not
> to lie to you, dealing with temporary outages (how do notify the OTP
> operator that it's failing) or implementing all sorts of edge case
> handling inside of OTP. Instead, just post GTFS-RT messages straight
> to a servlet running inside OTP that interprets the results and calls
> a GraphUpdater interface.
>
> In other words, I suggest this much simpler architecture:
>
> 1) Create a simple Java interface for changing the currently-loaded
> graph. Implement this however you like as you described below, just
> make it thread-safe. 2) Create a servlet for handling POSTed GTFS-RT
> messages which calls the interface in #1. 3) If you want a
> polling-style interface, write an external program that does a GET
> from the GTFS-RT sources, then POSTs to the servlet in #2. Run it
> under cron or wrap it in a script to repeat. Make it smart about
> caching-related headers (if-modified-since, cache-control, etag,

> etcï¿½). 4) If you want OTP security, use your servlet container's

> capabilities (ie. https, basic auth, client SSL cert verification,

> etcï¿½). If you want client security, implement it inside the external

Laurent GRÉGOIRE

unread,

Aug 16, 2013, 6:19:10 AM8/16/13

to opentripplanner-dev

Hi Andrew,

On 15/08/2013 18:10, Andrew Byrd wrote:
> [...] Laurent, the subject is very close to your recent

> commits. Some of the questions that arise are best answered by you, and
> we would be interested in any feedback you could provide. Jaap is ready
> to implement the proposals below as part of his realtime update merge.

Thanks for the detailed update on your work! My comments below.

> Besides the added complexity of having two models for interaction
> between the Executor and the GraphUpdaters, this strategy has another
> problem. Graph updater tasks need to do some parsing/decoding and
> preparation of the incoming data, open network connections, handshake
> with servers, etc. We do not want these tasks to stall the lone graph
> update thread and postpone execution of other queued GraphUpdater
> Runnables (a sluggish bike rental fetch could prevent people from seeing
> up-to-date arrival time predictions). Therefore, we propose to give each
> GraphUpdater (both polling and event driven types) its own free-running
> thread which handles all server interaction and inserts one-off (not
> periodically scheduled) Runnable tasks into a shared, single-threaded
> Executor/ExecutorService. The Runnables inserted in that Executor would
> perform only the minimum amount of work necessary to respect the version
> control model, and would possess references to the pre-fetched,
> pre-processed updates that they will apply.

+1 for separating the updater thread from a single per-graph-updater
thread ensuring thread-safety.

Furthermore having one single thread / point of access for graph updates
could help for unforeseen needs in the future (logging, filtering,
checking for duplicates, etc...)

As for having one thread per updater, we could also use the fact that
the standard Java "executor" framework allow for a pool of threads. So
it may be a good trade-off between one single thread and one thread per
client: a blocking client thus would not lock all updaters. Using the
Java executor framework also make sure some nasty details of running a
background tasks are properly handled by the library itself. The
question would be to configure the number of threads in the pool.

> This means that the periodic update triggering logic would be
> implemented by free-running GraphUpdater threads rather than a single
> ScheduledUpdaterService (though a GraphUpdater could internally use a
> ScheduledUpdaterService for this purpose). Periodic fetch is a common
> need and should be implemented in an abstract base class shared by all
> GraphUpdaters of the polling type. On the other hand, an even-driven
> GraphUpdater would open a persistent connection to a server in its own
> thread, registering a "message received" callback that decodes a single
> message and inserts a Runnable to handle it in the shared queue. This
> logic would probably be quite specific to the protocol and client
> library in use.

I may not understand fully what the proposed plan is, but wouldn't it be
simpler for event-driven threads to push updates to an update-object
queue directly instead of pushing a Runnable making the update? If many
type of update objects are needed the updater thread/Q could handle this
by providing several classes or delegating the update mechanism on some
polymorphic update method on the update object ifself. Well, that's
probably details anyway, just my two cents.

Also I would not favor using the ubiquitous Runnable interface but, as
already discussed, creating a specific interface. It could be better for
both readability and reverse engineering (I'm thinking of call-hierarchy
source navigation).

+1 for all of this. I was also thinking to make the updater a Graph
field as it will probably become an important and mandatory service. In
term of readability it's way simpler too.

> GraphUpdaterRunnables currently have a run method which is wrapped to
> create anonymous Runnable classes, which are then handed to the
> Executor. GraphUpdaters and tasks to be placed in the global Executor
> queue could instead just extend Runnable directly.
>
> We also considered placing lists of internal update messages into a
> single shared, threadsafe queue rather than placing arbitrary Runnables
> in a shared Executor. However, this strategy requires a message type for
> every possible graph modification operation and limits the flexibility
> of GraphUpdaters. Since runnables can do just about anything, we need
> well-documented and enforced conventions, perhaps materialized as an
> abstract base class, to ensure that GraphUpdaters work as expected.
>
> PreferencesConfigurable
> =======================
>
> We also want to configure these GraphUpdaters via Preferences files
> placed in the Graph's directory. Laurent has implemented this. We have a
> few questions about that implementation and suggestions for updating it.
> GraphRuntimeConfigurator currently goes through a "bean"-like process of
> instantiating graph updaters via reflection, then configuring them. We
> would prefer to improve encapsulation and rely less on functions with
> side effects. We are also facing a potential tangle of new classes and
> interfaces. Questions:
>
> 1. Could we get rid of the PreferencesConfigurable interface and just
> merge it into GraphUpdater? Is there anything we need to configure with
> preferences that is not in some sense a graph updater thread?

As for merging the two classes, at first sight I would have preferred to
keep an interface, but as I do not see any way in Java to ensure
constructor signature using interface this is probably the best way to go.

As for other configuration types, for now I don't see anything. Maybe as
discussed with "povder" on the dynamic car speed proposal some new
updater could come, but they would be "updaters" too.

> 2. Do the configurators need to be separate from the configured? Can we
> for example merge BikeRentalConfigurator into BikeRentalUpdater2 since
> BikeRentalConfigurator always makes an instance of this class?

They do not need to be. I kept them separated to prevent the updater
stuff from depending on the configuration stuff, and to make it easier
to keep Spring/DI backward-compatibility. Having one or two classes is
more a matter of taste.

> Here is some example skeleton code (not compiled or tested):
>
> public interface GraphUpdater extends Runnable {
> public GraphUpdater fromPreferences(Preferences p, Graph g);
> }
>
> public class ExampleGraphUpdater implements GraphUpdater () {
> Graph graph;
> /** Factory method that produces a concrete instance */
> @Override
> public GraphUpdater fromPreferences(Preferences p, Graph g) {
> this.graph = g;
> //...
> }
> /** Run "forever", polling or handling message-driven callbacks */
> @Override
> public void run() {
> while (not interrupted) {
> sleep();
> message = decode(poll(url));
> graph.getGraphUpdaterManager().execute(
> new GraphUpdateTask(message));
> }
> }
> }

As said before, for the record, I would just remove the Runnable
interface and add a run() method (with it's proper throws clause if
needed) to the GraphUpdater interface itself.

> Instead of using reflection to instantiate a type, I would even prefer a
> hard-coded switch enumerating all the options:
>
> public class GraphUpdaterConfigurator {
> // what is currently known as GraphRuntimeConfigurator
> public void setupGraph(Graph graph, Preferences prefs) {
> GraphUpdaterManager gum = new GraphUpdaterManager(graph);
> // ...
> for (String configurableName : config.childrenNames()) {
> Preferences prefs = config.node(configurableName);
> String type = prefs.get("type", null);
> GraphUpdater gu = null;
> if (type.equals("bike-rental")) {
> gu = BikeRentalUpdater2.fromPreferences(prefs, graph);
> } else if type.equals("gtfs-rt")) {
> gu = GTFSRTGraphUpdater.fromPreferences(prefs, graph);
> }
> gum.add(gu);
> }
> if ( ! gum.isEmpty()) {
> graph.setGraphUpdateManager(gum);
> gum.startGraphUpdaters();
> }
> }
> }

I agree it's way simpler with a switch, but this prevent from
dynamically injecting new configurator from external libraries, as I was
proposing in an earlier thread for dynamic car speed changes. But I'm
not sure this is an important feature and we can probably live without this.

We could also use a factory interface (with the factory object providing
both the key and creating the updater), or using reflection and calling,
on a provided class, either 1) a static method "fromPreferences" or 2) a
constructor with the preferences as parameter. It's up to you!

> Finally, can the EmbeddedConfigService be eliminated and replaced with a
> field on the Graph itself (Properties embeddedPreferences = ...)? Do we
> really need embeddedConfig or can we just serialize a
> GraphUpdaterManager and GraphUpdaters along with the Graph?

It's way simpler to embed the preferences as keys on the graph itself, I
agree. I made it part of a keyed-service to again prevent too much
dependencies from the core on the configurator stuff, but it's probably
not worth the trouble. Having this field would be simpler indeed.

If we serialize the updater object/thread themselves with the graph, we
could run into initialization issues (I'm thinking of starting up
threads, network connections, etc...) and this could soon become a
maintenance burden, especially that this mode would not be used that
often. This could be mitigated but at the expense of making the updaters
themselves aware of this serialization issues. Also creating the updater
during graph startup or graph building could see it's environment
change, as the two stages are not done at the same time / using the same
environment. Embedding the properties themselves is probably way safer
in term of maintenance and compatibility, and as simpler.

HTH,

--Laurent

Andrew Byrd

unread,

Aug 16, 2013, 8:36:48 AM8/16/13

to opentripp...@googlegroups.com

On 08/16/2013 12:19 PM, Laurent GR�GOIRE wrote:
> As for having one thread per updater, we could also use the fact that
> the standard Java "executor" framework allow for a pool of threads. So
> it may be a good trade-off between one single thread and one thread per
> client: a blocking client thus would not lock all updaters. Using the
> Java executor framework also make sure some nasty details of running a
> background tasks are properly handled by the library itself. The
> question would be to configure the number of threads in the pool.

Yes, the Executor API is nice and should be simpler for us than
low-level thread creation and management. We could have one
SingleThreadExecutor for the graph-writer tasks and one CachedThreadPool
for the fetch/decode tasks. Presumably both would be held by a
GraphUpdaterManager (ex-PeriodicTimerGraphUpdater) instance on a
per-Graph basis. The CachedThreadPool will grow without bound to
accommodate long-running updaters. The SingleThreadScheduledExecutor is
assuming we are putting Runnables on the queue, not update objects (see
below).

> I may not understand fully what the proposed plan is, but wouldn't it be
> simpler for event-driven threads to push updates to an update-object
> queue directly instead of pushing a Runnable making the update?

We discussed this possibility, but it requires us to have a class
representing each possible kind of update. We currently have them for
trip updates (to provide an internal lingua franca for multiple RT feed
formats), but do we want create event objects for every kind of graph
update (add bicycle rental station, etc.)? In the short term it will be
expedient to submit instances of a class implementing Runnable to the
queue, and constrain how those Runnables behave by convention rather
than force.

If we use a work queue rather than a SingleThreadExecutor, the lone
graph writer thread would just block waiting to consume update items off
the work queue. It might even be managed by the same CachedThreadPool as
the fetch/decode tasks.

> Also I would not favor using the ubiquitous Runnable interface but, as
> already discussed, creating a specific interface. It could be better for
> both readability and reverse engineering (I'm thinking of call-hierarchy
> source navigation).

I think we all favor having separate interfaces or base classes
(GraphUpdateReceiver/GraphUpdateApplier) to make things more clear and
readable. The issue here is a relatively minor one of style: inheritance
versus composition. Should GraphUpdateReceiver/Appliers extend/implement
Runnable, or instead be wrapped in a Runnable for submission to the
Executor? In my opinion the latter adds a bit of unnecessary code and
complexity. In either case, the GraphUpdateManager would only allow you
to submit(GraphUpdateReceiver/Applier), not submmit(Runnable).

> +1 for all of this. I was also thinking to make the updater a Graph
> field as it will probably become an important and mandatory service. In
> term of readability it's way simpler too.

In the long term I would actually lean toward removing the Services Map
from Graph, and explicitly listing out all extensions in separate
fields. There are only a few Graph instances per OTP server, so null
fields are harmless.

>> 1. Could we get rid of the PreferencesConfigurable interface and just
>> merge it into GraphUpdater? Is there anything we need to configure with
>> preferences that is not in some sense a graph updater thread?
>
> As for merging the two classes, at first sight I would have preferred to
> keep an interface, but as I do not see any way in Java to ensure
> constructor signature using interface this is probably the best way to go.

We had exactly this conversation at GoAbout :) I can see the argument
for the inability to require constructor signatures, since a particular
concrete subclass may need more information to construct than the base
class / interface it is extending/implementing. Anyway, required factory
methods are workable (and maybe even more readable), as demonstrated in
my sample code. We also agreed that well-documented convention was more
important than devising interfaces that force us to behave in a
particular way.

> As for other configuration types, for now I don't see anything. Maybe as
> discussed with "povder" on the dynamic car speed proposal some new
> updater could come, but they would be "updaters" too.

We also couldn't think of any non-graph-updater uses. Let's just keep it
simple for now and avoid abstracting out configurability until we
actually have a use case.

>> 2. Do the configurators need to be separate from the configured? Can we
>> for example merge BikeRentalConfigurator into BikeRentalUpdater2 since
>> BikeRentalConfigurator always makes an instance of this class?
>
> They do not need to be. I kept them separated to prevent the updater
> stuff from depending on the configuration stuff, and to make it easier
> to keep Spring/DI backward-compatibility. Having one or two classes is
> more a matter of taste.

Great, then I would favor merging them for clarity. Personally I don't
really mind if new features (e.g. graph updater configuration) are not
Spring-aware. Anyway, once the standalone mode has seen some more
testing, we could just make the legacy servlet a wrapper around the
Spring-less code used by the standalone server.

> As said before, for the record, I would just remove the Runnable
> interface and add a run() method (with it's proper throws clause if
> needed) to the GraphUpdater interface itself.

Sorry to make a conversation out this minor issue, but I think I'm
missing something. What is the downside of extending Runnable, as long
as we make the details of the GraphUpdateManager private and make the
task or updater submission methods only accept our Runnable subinterfaces?

> I agree it's way simpler with a switch, but this prevent from
> dynamically injecting new configurator from external libraries, as I was
> proposing in an earlier thread for dynamic car speed changes. But I'm
> not sure this is an important feature and we can probably live without this.

At this stage, I think such dynamic injection would create more problems
than it solves. One of the big style difficulties we have encountered in
OTP is premature generalization and "everything must be a pluggable
framework". Developers are free to fork OTP and add graph updaters for
their own use. If our updater architecture is sound and flexible (which
I expect it is after this conversation), it should be painless to keep
that special-purpose updater code self-contained in a package and
contribute it as a pull request.

> We could also use a factory interface (with the factory object providing
> both the key and creating the updater), or using reflection and calling,
> on a provided class, either 1) a static method "fromPreferences" or 2) a
> constructor with the preferences as parameter. It's up to you!

If we're going to get very pluggable, Updaters could even provide their
own key Strings via an interface method. But as amusing as infinitely
extensible external plugin systems are, I don't really see a strong use
case for them in OTP.

> It's way simpler to embed the preferences as keys on the graph itself, I
> agree. I made it part of a keyed-service to again prevent too much
> dependencies from the core on the configurator stuff, but it's probably
> not worth the trouble. Having this field would be simpler indeed.

I think configuration of graph updaters is here to stay, and don't mind
dependencies here. My point of view is: don't hesitate to put optional
service fields directly on the Graph!

> If we serialize the updater object/thread themselves with the graph, we
> could run into initialization issues (I'm thinking of starting up
> threads, network connections, etc...) and this could soon become a
> maintenance burden, especially that this mode would not be used that
> often.

You're right, serializing the updaters could get ugly. It was wrapping
the preferences in an ad-hoc "service" that made me question storing
them in the Graph. If they get their own field with a clear purpose the
current approach should work fine. I do wonder whether we need to embed
preferences at all though.

-Andrew

Laurent GRÉGOIRE

unread,

Aug 16, 2013, 10:43:49 AM8/16/13

to opentripplanner-dev

On 16/08/2013 14:36, Andrew Byrd wrote:
> The issue here is a relatively minor one of style: inheritance
> versus composition. Should GraphUpdateReceiver/Appliers extend/implement
> Runnable, or instead be wrapped in a Runnable for submission to the
> Executor? In my opinion the latter adds a bit of unnecessary code and
> complexity. In either case, the GraphUpdateManager would only allow you
> to submit(GraphUpdateReceiver/Applier), not submmit(Runnable).

Both solution would be similar anyway, even by encapsulating the
GraphUpdateReceiver/Applier by a Runnable you still need a method to
call on the GraphUpdateReceiver/Applier itself I guess.

>> As said before, for the record, I would just remove the Runnable
>> interface and add a run() method (with it's proper throws clause if
>> needed) to the GraphUpdater interface itself.
>
> Sorry to make a conversation out this minor issue, but I think I'm
> missing something. What is the downside of extending Runnable, as long
> as we make the details of the GraphUpdateManager private and make the
> task or updater submission methods only accept our Runnable subinterfaces?

It's just a matter of convenience of reading the code: a call hierarchy
on Runnable::run() in an IDE is giving you zillions of hints and render
the list almost unusable :) But re-reading my comment it may be not
clear: the point is not to remove the interface, it's just not using
java.lang.Runnable and adding a custom interface.

Also this allow the run method to be renamed and/or adding optional
throws clause, if needed. But that's details anyway.

> If we're going to get very pluggable, Updaters could even provide their
> own key Strings via an interface method. But as amusing as infinitely
> extensible external plugin systems are, I don't really see a strong use
> case for them in OTP.

Agreed. If the need arise in the future it would be straightforward to
switch from a static switch to a more flexible set of factories.

> You're right, serializing the updaters could get ugly. It was wrapping
> the preferences in an ad-hoc "service" that made me question storing
> them in the Graph. If they get their own field with a clear purpose the
> current approach should work fine. I do wonder whether we need to embed
> preferences at all though.

I added preferences embedding for two reasons:
1) to make runtime configuration work with the "in-memory" standalone
mode (but that's a side-effect, this could have been implemented otherwise);
2) to make the system easier to use: the idea was to have all the
configuration in the same place at graph building time and then having a
single and self-contained Graph.obj.

Also just as a reminder, if we officially drop Spring/DI support for the
updater mechanism we can delete both BikeRentalUpdater (and maybe rename
BikeRentalUpdater2) and the old GraphPeriodicUpdater (not sure about the
name) in the servlet module. This should be straightforward as nothing
depend on them now (except old-spring configs).

HTH,

--Laurent

Aaron Bannert

unread,

Aug 16, 2013, 2:48:30 PM8/16/13

to opentripp...@googlegroups.com

Thanks for the comments.

To be clear, my concern isn't so much about the overhead of a persistent connection vs short lived connections vs another message passing system such as a message bus or some RPC. My concern is about having the core of OTP be concerned with vehicle positions rather than just being concerned with graph changes. The whole system should of course be able to handle very high-volum vehicle position updates, but I think we can accomplish that in a way that is both scalable and flexible, and doesn't require that processing to happen in the core of an OTP graph server.

GTFS-RT describes three parts, but let's ignore alerts for a moment (it's trivial), and just talk about vehicle positions vs. trip updates.

Trip updates make perfect sense for the core of OTP (ie. the GraphUpdater interface), since each trip update is exactly one graph update. I think our core graph updating interfaces should be designed around trip updates. Vehicle positions, however, are going to be much more problematic for a few reasons I can think of off the top of my head:

1) when the vehicle runs on time, there is no change to the graph, it just burns CPU (which is the vast majority of the time, even here in San Francisco where we don't have the best on-time record (70-80% over the last few years))
2) even if the vehicle isn't on time, but it's delayed the same amount as the last update ~1 minute ago, there is again no graph change, and it just burns CPU
3) calculating schedule deviation will inevitably become an algorithm focused on predicting vehicle arrivals, and those algorithms can become quite sophisticated and will require some serious horsepower (CPU, memory, maybe even I/O)

Since the core purpose of OTP is trip planning, and since any graph updates are inherently going to require thread safety and critical sections of code, it seems to me that the most prudent design would be to avoid any non-essential processing from the core graph updating code path (which I think you've already proposed), *and* to eliminate any CPU-intensive code from the core OTP servlet container that doesn't have to do with trip planning. The code to perform the actions in #1-3 above can operate in a separate process or on a separate box just as well as within the core of OTP, but in an external process the operator has the flexibility deploy it where most appropriate.

Overall, designing it this way will give much greater scalability, more flexibility in how OTP is deployed, and make it easier for people to write adapters for other forms of realtime updates (since they just have to write something that converts into whatever interface OTP exposes, be it GTFS-RT over persistent connections, HTTP POST messages, or a message bus). It also makes it easier to experiment with new arrival prediction algorithms without having to reload the graph server for each change (which is hard with large graphs).

One other perspective: Imagine a large-scale OTP deployment with multiple OTP graph server slaves (for redundancy and capacity), hundreds of agencies and 10s of thousands of vehicles, each of which report in at least once a minute. If each of these OTP slaves needs all 10k messages per minute, each will be dominated by TCP, HTTP, protocol buffer parsing, GTFS-RT parsing, path distance estimates, schedule lookups, and arrival prediction algorithms. That's a heck of a lot of time *not* spent searching the graph for trip plans. In my design, a single external process (one per GTFS-RT source) coalesces vehicle positions into relevant trip updates and forwards those to each OTP server (via message bus, persistent connection or HTTP POST). Each OTP then only makes small, minor adjustments to the graph and saves all the rest of the CPU time for performing graph searches.

Does that make sense, and did I miss anything?

-aaron

Andrew Byrd

unread,

Aug 16, 2013, 3:25:05 PM8/16/13

to opentripp...@googlegroups.com

Hi Aaron,

I think you'll be happy to learn that our system already works in the
way you described. Our realtime routing code consumes pre-calculated
arrival/departure times, which implicitly include some notion of
position since we can see that arrival times at some stops are in the past.

Thomas has written GTFS-RT producer software which takes per-route
punctuality messages of the type "Route 8 is 5 minutes late", considers
timing points, driving rules, expected maximum speeds, and whatever
other information he can get, then turns them into trip updates
something like:

Route 8
1 PASSED
2 PASSED
3 20:50 (=sched +5)
4 21:00 (=sched +3)
5 21:10 (=sched +2)
6 21:20 (=sched +1)
7 ON TIME
8 ON TIME

The prediction module sits entirely outside OpenTripPlanner, and is
intended to feed multiple servers using different router cores, not all
of which are OTP. OTP mostly just rewrites its timetables based on these
pre-calculated incoming messages. I think we all agree that arrival time
prediction systems for production use are independent systems that can
be run on a separate machine. Ideally they will eventually use
statistical models and take historical data into account, calculations
we want to perform outside any one route planner.

-Andrew

Thomas Koch

unread,

Aug 17, 2013, 3:23:28 PM8/17/13

to opentripp...@googlegroups.com

Yes, for the Netherlands i've written software that generates timeupdates for Transit in the Netherlands. This are about 5000 vehicles driving simultaneously during the peak hours. These produce BISON KV6, which is somewhat similar to VehiclePositions GTFS-RT.

Each 60 seconds enroute/onstop or on arrival/departure a message is sent with the punctuality and tripstatus. I then use this punctuality and the static timetable (timingpoint,distance,drivetime) to propagate the delay down to 0 seconds. This information is then available via static GTFS-RT protobufs and differentials via websocket.

I think such a system is already contained within OTP and decayingtriptimes But a more sophisticated prediction is better suited for a separate package.

Op vrijdag 16 augustus 2013 21:25:05 UTC+2 schreef Andrew Byrd:

Ben

unread,

Aug 26, 2013, 10:27:50 AM8/26/13

to opentripp...@googlegroups.com

Hello everyone

As usual, I'm sorry for the late response. It's great to see the progress on real time.

We (Moovit) start working on new real time implementation on our system and as part of it on the otp.

I have noticed that some of the issues was discussed here was committed and merged.

Can I ask what is the current status of these changes and if there is anything we can contribute?

Thanks

Ben

Reply all

Reply to author

Forward

0 new messages