More detailed docs?

2 views
Skip to first unread message

shlomo

unread,
Aug 5, 2008, 3:06:56 PM8/5/08
to lifeguard-dev
Hi,

I'm trying to figure out the steps involved in creating a new service.

I know I need to create a new subclass of AbstractBaseService. But I
am not sure about the following:
- I need some kind of XML describing the workflow. How do I create
this XML - based on what, and how do I create the associated class?
- Where is there documentation about how to pass parameters to
services?

I've checked out the project wiki and I don't see anything to help
beyond the Getting Started guide - which leaves off at running the
canned services from the lifeguard AMI.

Any other docs available to cover my questions?

Thanks.

.. Shlomo

David Kavanagh

unread,
Aug 5, 2008, 3:18:22 PM8/5/08
to lifegu...@googlegroups.com
I'll try to answer you here and perhaps there is more documentation I
can create to make it easier for others.

Yes, extending AbstractBaseService is the first step. There are some
examples in the workspace on how to do this.
Once you have your service running (either running it manually, or by
using lifeguard to run your service AMI), you feed it messages via
SQS. There are a couple of ways to do this. The best way to start is
by running the sample ingestor included in the workspace. The
IngestorBase class does all of the heavy lifting for you. It has been
extended with the FileIngestor. That sample app takes a bunch of
arguments that control it. Here's the usage string;
Usage: FileIngestor <aws.props> <project> <batch> <bucket>
<workflow.xml> file1 ...

here are more details about those params;
aws.props - the location of the aws.properties file which includes
your access id, etc.
project - the name of the project (used for record keeping mostly)
batch - the batch name/number you'll assign to this run of data
(useful for record keeping)
bucket - the bucket to use for storing your input and output files.
workflow.xml - a file that defines the workflow you will run. For your
case, the workflow will be a single service. Examples are in the
workspace
file1 ... - 1 or more files to be submitted

The FileIngestor will read your files and send them to S3. It will
also send work requests that the service will read before working on
your file(s).

Does that help?

David

shlomo

unread,
Aug 5, 2008, 5:48:24 PM8/5/08
to lifeguard-dev
Yes, that helps a bit. I have some more basic questions:

1) Is sampleWorkflow.xml is the model to follow in order to create a
multi-step workflow?

2) Where is the XML schema / DTD describing workflows? The link inside
sampleWorkflow.xml pointing to <Workflow xmlns="http://lifeguard-
ws.directthought.com/doc/2007-11-20/"> is not working.

3) In my workflows (and probably every workflow, though I could be
wrong) each step in the flow outputs a product that is the same format
as is required by the input of the next step. This means that the
OutputType of step (x-1) must be the same as the InputType of step
(x), right?

4) [OK, maybe this is not a basic question.] How can I create a
workflow that requires services with more than one message channel?
For example, my workflow involves these three steps, each of which is
a service:
1) Compute the list of passengers who will be on this flight.
2) Collect a single passenger's history (say, his in-flight-purchases)
from multiple, distributed sources
3) Once all the purchase histories for all the passengers on this
flight are ready, compute the list of products to stock on this flight
to maximize expected revenues from in-flight purchases. (This whole
service #3 could possibly be implemented as two separate services, in
series: first service waits until they're all ready, the second
service computes).

[Air travel is not really my problem domain, but a useful analogy with
rich terminology.]

In order to implement service #3, I need to give it two separate input
types:
1 - a list of passengers on the flight, so it can know when all the
purchase histories have arrived. Sent by service 1.
2 - the purchase history of each customer, sent by service #2.

I believe I need two separate SQS queues to implement this service,
one for passenger lists and one for purchase histories. (Please
correct me if I am mistaken about needing two separate channels). How
would I configure lifeguard to support this type of relationship
between services?

Thanks.

.. Shlomo

David Kavanagh

unread,
Aug 10, 2008, 3:52:50 PM8/10/08
to lifegu...@googlegroups.com
Hi. I went out of town, but I'm back now. I'll try to answer your
questions below.

On Tue, Aug 5, 2008 at 5:48 PM, shlomo <shlomo....@gmail.com> wrote:
>
> Yes, that helps a bit. I have some more basic questions:
>
> 1) Is sampleWorkflow.xml is the model to follow in order to create a
> multi-step workflow?

Yes. when you specify services, make sure that at least one output
mime type on one service matches the input mime type on the next
service in the workflow.

>
> 2) Where is the XML schema / DTD describing workflows? The link inside
> sampleWorkflow.xml pointing to <Workflow xmlns="http://lifeguard-
> ws.directthought.com/doc/2007-11-20/"> is not working.

The schemas are in the workspace. Here's a link in the subversion
tree. http://code.google.com/p/lifeguard/source/browse/trunk/xsd/
BTW. the namespace indicator looks like a URL, but it doesn't need to
be working. It is just a way to differentiate one schem from another.

>
> 3) In my workflows (and probably every workflow, though I could be
> wrong) each step in the flow outputs a product that is the same format
> as is required by the input of the next step. This means that the
> OutputType of step (x-1) must be the same as the InputType of step
> (x), right?

yes. as I mentioned above, just one of the output types need to match
the input of the next. What this implies is that a service might
produce multiple files, which some do.

>
> 4) [OK, maybe this is not a basic question.] How can I create a
> workflow that requires services with more than one message channel?
> For example, my workflow involves these three steps, each of which is
> a service:
> 1) Compute the list of passengers who will be on this flight.
> 2) Collect a single passenger's history (say, his in-flight-purchases)
> from multiple, distributed sources
> 3) Once all the purchase histories for all the passengers on this
> flight are ready, compute the list of products to stock on this flight
> to maximize expected revenues from in-flight purchases. (This whole
> service #3 could possibly be implemented as two separate services, in
> series: first service waits until they're all ready, the second
> service computes).

I guess I'd think about what part of that needs to scale. Perhaps you
have 1 be a workflow (by itself), then 2 and 3 are their own workflow
another service. Once 1 is done, retrieve that list and generate work
requests (one for each passenger)

>
> [Air travel is not really my problem domain, but a useful analogy with
> rich terminology.]
>
> In order to implement service #3, I need to give it two separate input
> types:
> 1 - a list of passengers on the flight, so it can know when all the
> purchase histories have arrived. Sent by service 1.
> 2 - the purchase history of each customer, sent by service #2.

Would the scenario above work? If 1 generates a list, and each
passenger on the list needs some work done, it seems like running 1
once, needs to feed N requests for 2 and 3.

Reply all
Reply to author
Forward
0 new messages