I think it would be a good idea once there is a real implementation that
allows multiple graphs. It doesn't have to be anything fancy, but I
tend to not really like new interfaces unless I can actually see
multiple implementations.
> - Is anybody would be interested in this kind of changes? What would
> be the requirements?
We have heard from someone in Spain who wants to run OTP in multiple
cities with separate graphs. So this would definitely be useful there.
I'm excited about the idea, and look forward to seeing it get a bit more
fleshed out.
I think it would be a good idea once there is a real implementation that
allows multiple graphs. It doesn't have to be anything fancy, but I
tend to not really like new interfaces unless I can actually see
multiple implementations.
We have heard from someone in Spain who wants to run OTP in multiple
cities with separate graphs. So this would definitely be useful there.
Code is primarily in two modules:
onebusaway-federations:
https://code.google.com/p/onebusaway/source/browse/#svn%2Fonebusaway-application-modules%2Ftrunk%2Fonebusaway-federations
onebusaway-federations-webapp:
https://code.google.com/p/onebusaway/source/browse/#svn%2Fonebusaway-application-modules%2Ftrunk%2Fonebusaway-federations-webapp
You can see an example of a service interface annotated to indicate
routing hints here:
Thanks,
Brian
I think it might be helpful for OTP to have one interface
(PathService?) that serves as the main interaction point between any
ui modules (the webapp, the api webapp) and the routing+graph backend.
This has lots of nice benefits in that it serves as one single point
where you have to worry about these dynamic routing issues. It allows
for more complex behavior where a webapp (or multiple webapps) can be
running on one machine and the data service (or multiple data
services) can be running on another.
Just to be clear, you don't have to construct the proxy yourself using
the onebusaway-federations module. Instead, you annotate each method
in the target interface with a hint @Annotation that describes how the
method arguments can be used to figure out which underlying data
source should be used (based on lat-lon bounds and agency ids for the
most part). The proxy is then automatically constructed using those
annotations.
This approach forces you to think about how the arguments of each
method in your primary service interface could be routed and keeps you
from creating new methods that can't properly be routed. I think
that's a good thing, because it will keep some other OTP developer who
isn't using the multi-graph feature from introducing a new method
that's critical to OTP operation that can't properly be directed to an
underlying graph. In practice, I've yet to come up with a method in
OBA that I haven't been able to properly route.
Either way, even if you don't go the proxy method, I'd definitely like
to propose the standardization around one primary service interface
through which all UI module => data module calls go, since it will
make dealing with issues like this more straightforward in the future.
Thank,
Brian
I'd like to post an update on the current state of the planned change.
There is two basic mechanisms: singleton mode (identical to current
implementation), and a "multiple file" mode, where the selection is
based on a path template. There would be some impact on Spring
configuration, especially for the "end-user":
1) web.xml in api-webapp does not include "application-context.xml"
anymore (as we don't always want to create by default singleton beans
anymore),
2) user configuration data-sources.xml for "singleton" mode:
------------------------------%<--------------------------------------
<import resource="classpath:org/opentripplanner/api/application-context.xml"
/>
<bean id="graphBundle" class="org.opentripplanner.model.GraphBundle">
<property name="path" value="/path/to/graph"/>
</bean>
<bean id="pathServiceFactory"
class="org.opentripplanner.routing.impl.SingletonPathServiceFactoryImpl">
<property name="pathService" ref="pathService" />
</bean>
------------------------------%<--------------------------------------
3) data-sources.xml for "multiple file" mode:
------------------------------%<--------------------------------------
<bean id="pathServiceFactory"
class="org.opentripplanner.routing.impl.MultipleFilePathServiceFactoryImpl">
<property name="pathPattern" value="/path/to/graph/{}.obj" />
<property name="subApplicationContext"
value="org/opentripplanner/api/application-context.xml" />
</bean>
------------------------------%<--------------------------------------
4) application-context.xml needs a new "pathService" bean (may be
removed later if I can make autowiring works on dynamic bean creation)
An abstract "GenericMultiplePathServiceFactory", implementing common
caching, memory check and automatic reload functionalities, is used by
the multiple file implementation. It could be later re-used for more
exotic usages such as loading graph from a database, etc... When
requesting a new PathService, it uses the given subApplicationContext
and the various auto-wiring annotations to create a new set of
services. The aim is to have the same application configuration for
both singleton and multiple mode, using the same
application-context.xml configuration. The actual implementation is
free to create the data-source bean (currently a "GraphBundle"),
requested by the GraphService instance, from whatever source it wants:
programmatically, using xml configuration files, etc...
The load process "handle" the case where the update of a new graph is
not done atomically. If a corrupted stream exception is encountered,
it tries to reload the graph before bailing out. User is supposed to
do the copy properly but my guess is that it would be better to be
lenient on this... We also try to handle out of memory by evicting the
oldest accessed service to reclaim memory if needed.
Attached a "preview patch" (routing and api-webapp projects).
Comments welcomed!
--Laurent
(1) Since response narrative generation now depends so heavily on
PathService, maybe it should be moved into its own class so that we
don't have to keep passing PathService down into each method.
(2) I don't like the changes to GraphBundle. Presently, it's true that
a graph is a single file. But I want the flexibility in the future to
change it to multiple files. For instance, I could easily see the
various services being broken out into their own files just to make it
easier to (say) query what calendar dates a graph runs on, or make store
patches elsewhere, or something.
(3) When reloading graphs, I wonder about both the performance and
safety of the code.
(a) For performance, maybe it would be better to give a user the
obsolete pathService and trigger the reload in the background. Then no
user would have to wait a long time for a result.
(b) It seems possible that a second user could trigger a reload while
the first one is still in the middle of a reload. Can you either tell
me why I'm wrong, or fix this?
(4) I am not super-happy with evicting unused graphs to save memory;
this seems likely to cause churn, and thus bad performance. Do you see
a use case for this? Also, when you are evicting graphs, you don't run
a gc, which you need to in order for freeMemory to return something
sensible.
(5) PathServiceFactory presently operates only by Id, but I thought you
had also planned to allow a choice by latitude / longitude of the
requests' start/end. Do you plan to add this? Shouldn't it be all one
API call, in case a user wants to route by both id and location?
Thanks for your comments.
> (1) Since response narrative generation now depends so heavily on
> PathService, maybe it should be moved into its own class so that we
> don't have to keep passing PathService down into each method.
That seems to me a good idea too, especially that mixing the public
API and the narrative code hides the former. I guess this would need a
revision of the PathService interface? But maybe this can be done
later. The "dynamic graph feature" is quite independent from that.
> (2) I don't like the changes to GraphBundle. Presently, it's true that
> a graph is a single file. But I want the flexibility in the future to
> change it to multiple files. For instance, I could easily see the
> various services being broken out into their own files just to make it
> easier to (say) query what calendar dates a graph runs on, or make store
> patches elsewhere, or something.
Are you talking about the change in the GraphBundle itself, or the way
it's dynamically instantiated in the MultiplePathFactory
implementation? The change on GraphBundle itself is optional, if we
revert it, we just miss the ability of having a graph file named after
the router ID. At first sight, that seemed to me rather arbitrary to
have to name the graph file "Graph.obj", that's the reason why I added
the file parameter, but after reading your comment I see your point.
Anyway, if you want to have this implemented, the way GraphBundle is
instantiated has to be modified. Maybe we should just limit ourselves
to create a GraphService in the PathFactory implementation?
> (3) When reloading graphs, I wonder about both the performance and
> safety of the code.
>
> (a) For performance, maybe it would be better to give a user the
> obsolete pathService and trigger the reload in the background. Then no
> user would have to wait a long time for a result.
I was thinking about this too, but I dismissed it for now for the
following reasons:
a) It's easy for any administrator / script which does the graph copy
to immediately force a "refresh" (wget /ws/metadata for example).
b) It opens a "can of worms" as it needs multi-threading, or at least
some library to do background processing. I'm here having AppEngine in
mind, where spawning threads is forbidden, and I'm not sure we want to
create this kind of dependency.
c) Performance for loading an average graph seems to be in the order
of "seconds" on a decent server, which have to compare to
"user-experience" requirements. That wasn't personally an issue, but I
lack experience on really big graphs.
d) Loading the new graph while keeping the old one "just in case"
increase memory requirements, especially when administrator update all
graphs in a bulk (worst case scenario would be a factor 2 on memory
requirement). And memory seems to be a critical point for OTP.
What could be done, alongside the usual CPU/RAM trad-off, is a middle
scenario: returning the old version while the new one is not ready.
That way only one request (the first one after the refresh) would be
blocked waiting, not all of them for the same graph. This may be more
worthwhile for high-load servers. But point d) would still be an
issue.
> (b) It seems possible that a second user could trigger a reload while
> the first one is still in the middle of a reload. Can you either tell
> me why I'm wrong, or fix this?
You are totally right, I completely forgot about this one. My initial
idea was to synchronize the whole method, but I later realized that
would have blocked all requests for all graphs when loading a single
one, and a finer grained locking mechanism was needed. So yes, proper
locking IS needed... I have to figure out the simplest and most
efficient locking mechanism. Anyway, that would depend on the chosen
background loading mechanism.
> (4) I am not super-happy with evicting unused graphs to save memory;
> this seems likely to cause churn, and thus bad performance. Do you see
> a use case for this? Also, when you are evicting graphs, you don't run
> a gc, which you need to in order for freeMemory to return something
> sensible.
For the call to gc(), you're probably right, and the best of all is
that the two others are probably not needed. Anyway, I'm too not
really happy with this, especially that the current code does not
solve everything. It seems I don't really know anymore what I was
trying to solve exactly :)
My original concern was that nothing ever unload old unused graphs.
The use case is graph renaming, graph testing, very infrequently used
graphs, etc... Having your server throwing a OutOfMemory exception
when loading a graph you need, just because you have 10 unused graphs
in memory, may be a bit unfair.
I hesitated with some kind of unused "timeout", but in that case you
have to somehow decide on the timeout period: 1 hour, 1 month?. You
could also have some sort of graph file removal detection, but that
imply another dependency on the way it's stored. And a specific web
service interface to manually "shutdown" graph seems a bit overkill,
error-prone, and adds administration complexity. Having to set memory
threshold seemed the less evil of all. Another possible solution is to
do nothing and way for the user to shutdown the server :)
> (5) PathServiceFactory presently operates only by Id, but I thought you
> had also planned to allow a choice by latitude / longitude of the
> requests' start/end. Do you plan to add this? Shouldn't it be all one
> API call, in case a user wants to route by both id and location?
Right. I don't use coordinates for now, but they maybe used later on.
I'll add both end-points to them.
--Laurent
What I would like is to have the graph *directory* named for the router
id.
I'm convinced by the memory argument -- it seems entirely plausible that
people will not want to spend money for double the RAM just to save the
occasional user a slow load. So we can keep this as-is.
> > (4) I am not super-happy with evicting unused graphs to save memory;
> > this seems likely to cause churn, and thus bad performance. Do you see
> > a use case for this? Also, when you are evicting graphs, you don't run
> > a gc, which you need to in order for freeMemory to return something
> > sensible.
>
> For the call to gc(), you're probably right, and the best of all is
> that the two others are probably not needed. Anyway, I'm too not
> really happy with this, especially that the current code does not
> solve everything. It seems I don't really know anymore what I was
> trying to solve exactly :)
>
> My original concern was that nothing ever unload old unused graphs.
> The use case is graph renaming, graph testing, very infrequently used
> graphs, etc... Having your server throwing a OutOfMemory exception
> when loading a graph you need, just because you have 10 unused graphs
> in memory, may be a bit unfair.
>
> I hesitated with some kind of unused "timeout", but in that case you
> have to somehow decide on the timeout period: 1 hour, 1 month?. You
> could also have some sort of graph file removal detection, but that
> imply another dependency on the way it's stored. And a specific web
> service interface to manually "shutdown" graph seems a bit overkill,
> error-prone, and adds administration complexity. Having to set memory
> threshold seemed the less evil of all. Another possible solution is to
> do nothing and way for the user to shutdown the server :)
I think a web service is probably a better bet, now that we have support
for admin users. Personally, I would not set a threshold just because I
don't trust that freeMemory is accurate, and because I don't think I
would ever want to unload a graph other than to reload it. But if you
think a memory threshold would be useful, please just provide a way to
turn it off.
> > (5) PathServiceFactory presently operates only by Id, but I thought you
> > had also planned to allow a choice by latitude / longitude of the
> > requests' start/end. Do you plan to add this? Shouldn't it be all one
> > API call, in case a user wants to route by both id and location?
>
> Right. I don't use coordinates for now, but they maybe used later on.
> I'll add both end-points to them.
Thanks.
> What I would like is to have the graph *directory* named for the router
> id.
Ok, agreed. Change is straightforward.
> I think a web service is probably a better bet, now that we have support
> for admin users. Personally, I would not set a threshold just because I
> don't trust that freeMemory is accurate, and because I don't think I
> would ever want to unload a graph other than to reload it. But if you
> think a memory threshold would be useful, please just provide a way to
> turn it off.
If I add a method to unload a graph, I would be glad to remove all
memory management stuff, I was not happy with it anyway. Does
"/unload?routerId=xxx" seems a reasonable interface?
I'll try to post a revised version soon, with "optimized locking" for
graph reload, but probably not before next week.
Thanks for the comments,
--Laurent
Attached is an updated version of the "multi-graph" feature, with some
of the changes we talked about. My remarks:
- There is a new "asyncReload" option, which set to true allow to
serve an old version of the service while the new one is loaded. But
the first request after the graph file has been updated still have the
burden to load the graph ifself. Default is false as it increase
greatly memory requirements: attached 2 jconsole screenshots showing
memory usage of 2 tests: asyncReload=true/false (the 2), test made
with a ~250Mb graph file. Memory usage pattern difference is clearly
visible: there is a spike when async reload is set, a dip in the
opposite case.
- I completely removed the previous "automagic" memory management
stuff, including auto-eviction which was not playing nice with "async
reloading" and a bit too dangerous anyway.
- I did not added (yet?) the from/to parameter for graph selection: I
don't know what to do for loading the metadata as there is no location
in the request.
- There was a memory leak in the previous version, one have to close()
any unused spring application context as spring keep references to it
internally.
- There is currently no way to unload a graph. I plan to add the
functionality (webservice) in a second update. For now you have to
reload the server.
- The factory builds the new PatchService alongside the PathService. I
did not really tested patching though, I'm not sure about the way it
works.
- The new security-application-context configuration is common to all
graphs, as it does not contains any beans definition.
- I still have to declare two beans in the application-context, as
auto-wiring does not work when loading a spring context
programmatically. I don't think it's a problem though, but if anybody
has a hint on this, feel free to help.
Any comments welcomed!
--Laurent
1. Users eventually have to modify a bit all their customized
data-source.xml file, adding an import and a bean:
------------------------------------------------------------------------------------
<bean id="graphBundle" class="org.opentripplanner.model.GraphBundle">
<property name="path" value="/path/to/graph" />
</bean>
+ <import resource="classpath:org/opentripplanner/api/application-context.xml"
/>
+
+ <bean id="pathServiceFactory"
+ class="org.opentripplanner.routing.impl.SingletonPathServiceFactoryImpl">
+ <property name="pathService" ref="pathService" />
+ </bean>
------------------------------------------------------------------------------------
And eventually to remove application-context.xml reference from their
web.xml (if they use their own version).
2. Patching is still not using the multi-graph feature. Apparently
there is a admin-webapp to use, but not added as a project yet? I'd
like to discuss about including this, maybe a bit later until patching
is more mature? For me it would be only a matter of adding the
routerId parameter to the 3 requests.
3. Attached a second patch with modification to "webapp" with the
routerId parameter in the config.js file. Default is empty (null), but
this can be easily adapted to provide an example of integration on
client side. I'm not sure if I should commit this though. Comments
welcomed anyway.
4. I'd like to update the on-line documentation on data-source.xml
usage, and to add a section on multi-graph support, what would be the
procedure for that?
Thanks,
--Laurent
I don't know if you already have seen this, but if not, maybe this could
be interesting for someone....
http://ec.europa.eu/transport/its/multimodal-planners/take-up-the-challenge/index_en.htm
I think this is a huge problem, and maybe a good opportunity to
demonstrate the power of federated services like the ones used in One
Bus Away.
Best regards.
Fran.
--
Fran Peñarrubia
Scolab
www.scolab.es
Asociación gvSIG
www.gvsig.com
Mike
2011/6/22 Francisco José Peñarrubia <fpen...@gmail.com>:
Except that it would be hard to get the necessary data for European
transit services. Maybe we could do a North America demo since we have
plenty of data there.
We could probably do door-to-door BosWash-scale trips with a only a few
modifications. Transcontinental trips would require better timezone support.
-Andrew
<bean id="pathServiceFactory"
class="org.opentripplanner.routing.impl.SingletonPathServiceFactoryImpl">
<property name="pathService" ref="pathService" />
</bean>
in opentripplanner-routing: org/opentripplanner/application-context.xml
Such that the singleton factory would be the default behavior (which
is probably what most developers are going to use in practice).
And then if you wish custom behavior, you THEN include a
"pathServiceFactory" bean definition in data-sources.xml that will
override the default behavior.
This way default behavior is reasonable, doesn't break backwards
compatibility, requires less initial config by user, and is still
customizable.
Thoughts?
Thanks,
Brian
--Laurent
Brian
On Wed, Jun 22, 2011 at 7:57 AM, Laurent Gregoire
Before: web.xml included
"classpath:org/opentripplanner/api/application-context.xml" and
"data-source.xml"
After: data-source.xml includes
"classpath:org/opentripplanner/api/application-context.xml" if you are
in "single graph" mode (the default), and is specified as a property
in the pathServiceFactory bean otherwise.
So we can revert everything back to the way it was, but if a user want
to switch to the multiple graph mode, they have to move the
application-context.xml include from their web.xml to their
data-source.xml. In other word, we can either focus on backward
compatibility or ease of configuration switch between the two modes.
The include of classpath:org/opentripplanner/application-context.xml
from classpath:org/opentripplanner/api/application-context.xml did not
change however.
HTH,
--Laurent
Yes, the admin-webapp is not really mature. It requires an install of
onebusaway with a specific configuration, but we might switch to a
different configuration at some point.
> 3. Attached a second patch with modification to "webapp" with the
> routerId parameter in the config.js file. Default is empty (null), but
> this can be easily adapted to provide an example of integration on
> client side. I'm not sure if I should commit this though. Comments
> welcomed anyway.
This patch looks good to me, but I'm not a front-end expert. Frank?
> 4. I'd like to update the on-line documentation on data-source.xml
> usage, and to add a section on multi-graph support, what would be the
> procedure for that?
You should have edit rights on trac, so please go right ahead and make
your edits! Let me know if you have any problem.