Document persistence, Architecture

Phil Warner

unread,

Aug 18, 2008, 12:06:25 PM8/18/08

to htn-ea-devel

Greetings,

First of all, let me ask the following:

Does everyone agree that it isn't necessary to keep a copy of the
'resource' and 'inquiry' request documents? With the exception of
tracking who's making such requests, it doesn't really make sense to
use storage resources for this (and we can later add a component that
tracks such requests, if desired).

If you agree, please read on. Otherwise, I'll reconsider the
following and send a modified version.

-------------------

I've been thinking of changing the workflow (only slightly) since it
seems the top-level class (EmbeddedAgent--EA) is doing too much.

We can simplify the workflow (conceptually, at least) by moving the
request delegate (RD) nearer the interface. The RD can be the code
that accepts the request from the user agent (e.g., through a servlet
interface), and forwards it to the relevant underlying workflow.
Document validation and authorization would then become the
responsibility for the RD to request.

The EA orchestrator can be split up into separate (logical)
workflows: the 'resource' request (which likely doesn't need document
persistence), the 'inquiry' request (which, imo, also doesn't need
document persistence), and the 'request' request (for observation
requests; this does need document persistence).

Another thought I had that would be simplified by the above model is
that the TelescopeEventHandler could then call a separate orchestrator
(patterned after the EA), which processes the data (e.g., updating the
RTML, etc.) and sends it to the PluginExecutor, and ultimately back to
the user (as we discussed, through direct connection to their user
agent or by query).

Remember that the application is configured through Spring XML
configuration. In other words, the application becomes the assembly
of components, rather than the individual components themselves.

I don't see any changes to the edge components, with the exception of
the EA interface and implementation (which will likely be wrapped by a
servlet or other web interface component).

I'll sit on this for the next 24 hours (giving those who are
interested a chance to respond), then make the above changes if there
are no objections and/or other suggestions.

Cheers,

Phil

Chris Mottram

unread,

Aug 18, 2008, 12:16:32 PM8/18/08

to htn-ea-devel

On Mon, 18 Aug 2008, Phil Warner wrote:

>
> Does everyone agree that it isn't necessary to keep a copy of the
> 'resource' and 'inquiry' request documents? With the exception of
> tracking who's making such requests, it doesn't really make sense to
> use storage resources for this (and we can later add a component that
> tracks such requests, if desired).

I agree. The current TEA only keeps copies of successfully requested
observations. We only keep a copy of observations as a means of keeping
the information necessary to send asynchronous update / incomplete /
complete / fail documents back to the requesting agent. I can't at
the moment think of a reason you need a persistent copy of 'resource' and
'inquiry' request documents.

cheers

Chris

--
-----------------------------------------------------------------------------
Chris Mottram | email: c...@astro.livjm.ac.uk
Liverpool Telescope Programmer | phone: (0151) 231 2903
The Liverpool John Moores University. | fax: (0151) 231 2910
Astrophysics Research Institute. | WWW: www.livjm.ac.uk/astro/
Twelve Quays House. Egerton Wharf. |
Birkenhead. CH41 1LD |
------------------------------------------------------------------------------

Alasdair Allan

unread,

Aug 18, 2008, 12:22:38 PM8/18/08

to htn-ea...@googlegroups.com

>> Does everyone agree that it isn't necessary to keep a copy of the
>> 'resource' and 'inquiry' request documents? With the exception of
>> tracking who's making such requests, it doesn't really make sense to
>> use storage resources for this (and we can later add a component that
>> tracks such requests, if desired).
>
> I agree. The current TEA only keeps copies of successfully requested
> observations. We only keep a copy of observations as a means of
> keeping
> the information necessary to send asynchronous update / incomplete /
> complete / fail documents back to the requesting agent. I can't at
> the moment think of a reason you need a persistent copy of
> 'resource' and
> 'inquiry' request documents.

Sorry, what are resource and inquiry documents?

Al.

Chris Mottram

unread,

Aug 18, 2008, 12:26:02 PM8/18/08

to htn-ea...@googlegroups.com

On Mon, 18 Aug 2008, Alasdair Allan wrote:

>
> Sorry, what are resource and inquiry documents?

resource=phase0
inquiry=score(request)

Alasdair Allan

unread,

Aug 18, 2008, 12:35:42 PM8/18/08

to htn-ea...@googlegroups.com

>> Sorry, what are resource and inquiry documents?
>
> resource=phase0

Ah. No, no need to store these I think.

> inquiry=score(request)

The user agent stores these, I don't see the need for the node agent
to store them. The node agent will get a large number of scoring
requests (from multiple user agents) and I don't see why it would
need to store them, the user agent sends out a smaller number of
scoring requests and there is potentially useful information (see the
eSTAR status pages for the microlensing programme for example) of
keeping track of these returned scores at that level.

Al.

Ben Burleson

unread,

Aug 18, 2008, 12:59:18 PM8/18/08

to htn-ea...@googlegroups.com

It might be useful to see patterns of abuse from user agents, such
as pinging a telescope's resources constantly. That's all I can think of..

Cheers,
Ben

Alasdair Allan

unread,

Aug 18, 2008, 1:04:43 PM8/18/08

to htn-ea...@googlegroups.com

> It might be useful to see patterns of abuse from user agents, such
> as pinging a telescope's resources constantly. That's all I can
> think of..

Depends on what you think of as abuse I guess, certainly for some
programmes that's just normal strategy. The UA want's to have a
running idea of what the observing conditions are so it can be a
first cut decision without polling all the telescopes it knows about.
Cuts down traffic and increases the response time for time critical
decisions... of course the telescope operator might frown at this, so
it sort of depends where you're sitting... ;)

Al.

Alasdair Allan

unread,

Aug 18, 2008, 5:01:09 PM8/18/08

to htn-ea...@googlegroups.com

So does anyone know why I didn't get a copy of this till hours after
the follow up messages?

Al.

Ben Burleson

unread,

Aug 18, 2008, 5:13:17 PM8/18/08

to htn-ea...@googlegroups.com

The delivery of these has been all jumbled for me as well.

Eric Saunders

unread,

Aug 19, 2008, 10:32:18 AM8/19/08

to htn-ea-devel

Phil,

Not sure this is as simple a change as you think. There are several
distinct ideas here, and I'm not clear exactly what you're hoping to
achieve, so I'll ask my specific questions inline.

First, let me say that I agree: we don't need to manage state for
anything except actual observing requests. But I don't think any state
is currently managed in the existing architecture for anything except
requests. Scoring etc. passes straight through the management layer. I
agree that bypassing that component if it's not required sounds like a
good idea, but there are a few other points I'm a little wary about.

> We can simplify the workflow (conceptually, at least) by moving the
> request delegate (RD) nearer the interface. The RD can be the code
> that accepts the request from the user agent (e.g., through a servlet
> interface), and forwards it to the relevant underlying workflow.
> Document validation and authorization would then become the
> responsibility for the RD to request.

In the current model, everything coming into the agent from the
outside world passes through the endpoint, and on to the Auth module.
It then goes through a Parse module before hitting the RD. The
advantages of doing this were

i) you can throw away invalid users /requests immediately (even before
the parse phase)
ii) you can reject broken RTML from valid users immediately
(protecting the rest of your internals and allowing them to assume
they are handling sanitised inputs)

It seems that those two steps are common to all workflows (scoring,
inquiry or requests). So whatever workflow the request is forwarded
to, these steps have to happen. It's then not obvious to me how having
the RD request them is any simpler than the current situation?

I don't know very much about servelets. Are you saying that in this
model one must exist? One of the things I particularly like about our
current architecture is that you could write an extremely simple
endpoint (EP) that simply reads a file from local disk and passes the
resulting RTML string on to the next module (Auth). No web services,
sockets, servelets or carrier pigeons required. On its own this is
not much use to a remote user agent, but it is an extremely valuable
test case. Also, in principle a user could also hook arbitrary code up
to the front end, as long as it writes a file somewhere for the EP to
read (or even pipes it on STDIN). Is this simple scenario still
possible under this alternative proposal?

> The EA orchestrator can be split up into separate (logical)
> workflows: the 'resource' request (which likely doesn't need document
> persistence), the 'inquiry' request (which, imo, also doesn't need
> document persistence), and the 'request' request (for observation
> requests; this does need document persistence).

Assuming "EA orchestrator" refers to the Document Manager, I
completely agree with the idea of separate logical workflows. But I'm
not sure this translates into much change to the architecture. For
example, you could have a concrete workflow subclass for each type of
request, and the version of the Document Manager you instantiate is
polymorphically determined based on request type (and could be linked
back to an existing reference if we're talking about a persistent
request). Two out of three of these concrete implementations do not do
any state persistence and are very simple internally. But all of them
look the same from the outside...

> Another thought I had that would be simplified by the above model is
> that the TelescopeEventHandler could then call a separate orchestrator
> (patterned after the EA), which processes the data (e.g., updating the
> RTML, etc.) and sends it to the PluginExecutor, and ultimately back to
> the user (as we discussed, through direct connection to their user
> agent or by query).

What advantage do you perceive to having a second orchestrator here?
Updating the RTML will require knowledge of the existing state. I
don't see how this subsequent processing can be decoupled from the
existing Document Manager?

The steps after that are the same as our current model (so that's
fine!).

> Remember that the application is configured through Spring XML
> configuration. In other words, the application becomes the assembly
> of components, rather than the individual components themselves.

This hasn't actually been agreed! Despite misgivings, I'm willing to
reserve judgement until I see Spring in action. But it would be a
mistake to base any of our architecture on the assumption that we will
be using Spring. One compelling use case is that if one wanted to
implement the same architecture in Python or Perl, that should
certainly be possible (and they don't have Spring). The component
assembly stage should always be possible to handcode (and ideally,
very simple to do so). I expect it will be, and perhaps (probably?) we
will use Spring for convenience in this implementation. But I think
it's a mistake to *rely* on it.

> I don't see any changes to the edge components, with the exception of
> the EA interface and implementation (which will likely be wrapped by a
> servlet or other web interface component).

I don't either. As discussed above, though, I would be sorry to have
to mandate a web interface component for the EP.

Maybe a diagram of the proposed architecture would make it easier to
discuss? I will put up something representing what we had on the board
at the meeting as soon as I get a few minutes away from schedulers...

Cheers

Eric

euvitudo

unread,

Aug 19, 2008, 12:52:28 PM8/19/08

to htn-ea...@googlegroups.com

Hi Eric,

See below for my responses.

On Tue, Aug 19, 2008 at 7:32 AM, Eric Saunders <eric.s...@gmail.com> wrote:
>
> Phil,
>
> Not sure this is as simple a change as you think. There are several
> distinct ideas here, and I'm not clear exactly what you're hoping to
> achieve, so I'll ask my specific questions inline.

It is actually rather simple. I'll explain.

> First, let me say that I agree: we don't need to manage state for
> anything except actual observing requests. But I don't think any state
> is currently managed in the existing architecture for anything except
> requests. Scoring etc. passes straight through the management layer. I
> agree that bypassing that component if it's not required sounds like a
> good idea, but there are a few other points I'm a little wary about.

So, with the new code/arch., everything will be managed unless we make
an explicit decision as to what to manage--either in code or in
configuration; my suggestions was to allow this decision through
configuration. Again, I'll explain.

>> We can simplify the workflow (conceptually, at least) by moving the
>> request delegate (RD) nearer the interface. The RD can be the code
>> that accepts the request from the user agent (e.g., through a servlet
>> interface), and forwards it to the relevant underlying workflow.
>> Document validation and authorization would then become the
>> responsibility for the RD to request.
>
> In the current model, everything coming into the agent from the
> outside world passes through the endpoint, and on to the Auth module.
> It then goes through a Parse module before hitting the RD. The
> advantages of doing this were
>
> i) you can throw away invalid users /requests immediately (even before
> the parse phase)
> ii) you can reject broken RTML from valid users immediately
> (protecting the rest of your internals and allowing them to assume
> they are handling sanitised inputs)

So, I wasn't as complete as I should have been. Point (i) would have
fallen out immediately, i.e., I would have seen the issue very soon,
and would have taken the appropriate steps to include that in the
workflow [I believe I did mention validation at that point--which covers
point (ii)].

> It seems that those two steps are common to all workflows (scoring,
> inquiry or requests). So whatever workflow the request is forwarded
> to, these steps have to happen. It's then not obvious to me how having
> the RD request them is any simpler than the current situation?

Agreed, and it also doesn't make it any more complex.

> I don't know very much about servelets. Are you saying that in this
> model one must exist? One of the things I particularly like about our
> current architecture is that you could write an extremely simple
> endpoint (EP) that simply reads a file from local disk and passes the
> resulting RTML string on to the next module (Auth). No web services,
> sockets, servelets or carrier pigeons required. On its own this is
> not much use to a remote user agent, but it is an extremely valuable
> test case. Also, in principle a user could also hook arbitrary code up
> to the front end, as long as it writes a file somewhere for the EP to
> read (or even pipes it on STDIN). Is this simple scenario still
> possible under this alternative proposal?

So, servlets was a single example. I don't see this as prohibiting
a command-line client as well.

To further state: the framework is agnostic wrt how the request
arrives, be it servlet, command-line tool, GUI, etc. This is done
outside the framework; it's another component, a 'client' to the
framework.

>> The EA orchestrator can be split up into separate (logical)
>> workflows: the 'resource' request (which likely doesn't need document
>> persistence), the 'inquiry' request (which, imo, also doesn't need
>> document persistence), and the 'request' request (for observation
>> requests; this does need document persistence).
>
> Assuming "EA orchestrator" refers to the Document Manager, I
> completely agree with the idea of separate logical workflows. But I'm
> not sure this translates into much change to the architecture. For
> example, you could have a concrete workflow subclass for each type of
> request, and the version of the Document Manager you instantiate is
> polymorphically determined based on request type (and could be linked
> back to an existing reference if we're talking about a persistent
> request). Two out of three of these concrete implementations do not do
> any state persistence and are very simple internally. But all of them
> look the same from the outside...

"EA Orchestrator" refers to the main component we were building. The
Java interface is "EmbeddedAgent" and the default implementation (in
-core) is "EmbeddedAgentBean".

You are correct in that it introduces only a tweak in the architecture.

So, what you mention about 'concrete workflows' would be as in Spring,
you instantiate the component and assign its fields (depending upon
which workflow it represents). For each workflow you do this, then
assign each workflow to the delegate (RD) such that the RD can decide
which workflow should be used. You don't even have to use
polymorphism (in the sense of creating subclasses) unless you really
want.

>
>> Another thought I had that would be simplified by the above model is
>> that the TelescopeEventHandler could then call a separate orchestrator
>> (patterned after the EA), which processes the data (e.g., updating the
>> RTML, etc.) and sends it to the PluginExecutor, and ultimately back to
>> the user (as we discussed, through direct connection to their user
>> agent or by query).
>
> What advantage do you perceive to having a second orchestrator here?
> Updating the RTML will require knowledge of the existing state. I
> don't see how this subsequent processing can be decoupled from the
> existing Document Manager?

The second orchestrator would take the request (likely from the RD, or
some other component near the edge) and use the DocumentManager
for persistence, then forward it on to the PluginExecutor.

The DocumentManager can either be tasked with the responsibility to
understand the state (which may be too much responsibility for the
DocumentManager) or some other component can be used in
coordination with the DocumentManager to manage the state (e.g., for
updates).

> The steps after that are the same as our current model (so that's
> fine!).
>
>
>> Remember that the application is configured through Spring XML
>> configuration. In other words, the application becomes the assembly
>> of components, rather than the individual components themselves.
>
> This hasn't actually been agreed! Despite misgivings, I'm willing to
> reserve judgement until I see Spring in action. But it would be a
> mistake to base any of our architecture on the assumption that we will
> be using Spring. One compelling use case is that if one wanted to
> implement the same architecture in Python or Perl, that should
> certainly be possible (and they don't have Spring). The component
> assembly stage should always be possible to handcode (and ideally,
> very simple to do so). I expect it will be, and perhaps (probably?) we
> will use Spring for convenience in this implementation. But I think
> it's a mistake to *rely* on it.

Well, also remember that we did discuss the point. We agreed that
someone could work with the framework by coding it all up in Java.
Whether configured through Spring or hard-coded into Java classes
is really doesn't matter. Spring simply makes it easier (yeah, I'll
provide an example).

For Perl or Python, I don't see the lack of Spring as a disadvantage,
necessarily. As I mentioned, you can still hard-code usage of the
framework.

>> I don't see any changes to the edge components, with the exception of
>> the EA interface and implementation (which will likely be wrapped by a
>> servlet or other web interface component).
>
> I don't either. As discussed above, though, I would be sorry to have
> to mandate a web interface component for the EP.

So, again, this was an example.

> Maybe a diagram of the proposed architecture would make it easier to
> discuss? I will put up something representing what we had on the board
> at the meeting as soon as I get a few minutes away from schedulers...

Sure. I'll put something together and send it along (or put it on the wiki).

In summary, I don't believe these changes really affect anyone, but
since the responsibility of the relevant components may change, I
thought I'd point out the fact I want to modify it.

Cheers,

Phil

Eric Saunders

unread,

Aug 21, 2008, 12:13:04 PM8/21/08

to htn-ea...@googlegroups.com

Hi Phil

Thanks for the clarifications. I am suitably reassured. ;)

I'd still like to see a diagram, but seeing as no one else has any
other concerns, please go ahead and implement your architecture
changes.

The other thing that your discussion points out to me is that an
explicit definition of the perceived responsibilities of each
component would be very useful. A wiki page sounds ideal for this.
I'll start one off if I get a chance, although I'm away tomorrow and
Monday.

Just trying to draw the EA architecture diagram from the meeting now. ;)

Cheers

Eric

euvitudo

unread,

Aug 21, 2008, 3:53:01 PM8/21/08

to htn-ea...@googlegroups.com

On Thu, Aug 21, 2008 at 9:13 AM, Eric Saunders <eric.s...@gmail.com> wrote:
>
> Hi Phil
>
> Thanks for the clarifications. I am suitably reassured. ;)
>
> I'd still like to see a diagram, but seeing as no one else has any
> other concerns, please go ahead and implement your architecture
> changes.

Will do.

> The other thing that your discussion points out to me is that an
> explicit definition of the perceived responsibilities of each
> component would be very useful.

Seems reasonable, and in line with documentation requirements.

> A wiki page sounds ideal for this. I'll start one off if I get a chance,
> although I'm away tomorrow and Monday.

Yes, and I've been away as well. I'll hopefully get a diagram this
afternoon sometime.

> Just trying to draw the EA architecture diagram from the meeting now. ;)

So, I'll send what I have along as soon as I update the diagrams.

Cheers,

Phil

Reply all

Reply to author

Forward