Service without an input file

2 views
Skip to first unread message

shlomo

unread,
Aug 10, 2008, 12:48:51 PM8/10/08
to lifeguard-dev
Hi,

I have a service whose input is completely specified by the parameters
passed in the message - it doesn't require any files in S3, neither as
input nor as output.

I see that the AbstractBaseService always tries to get the input file
from S3. While I can hack this code so it checks a parameter (or some
similar hack) to decide if it needs the file or not, I'm interested in
a better, more reusable solution that can also help others.

What changes do you recommend?

I'd be happy to submit a patch for this.

.. Shlomo


David Kavanagh

unread,
Aug 10, 2008, 3:59:43 PM8/10/08
to lifegu...@googlegroups.com
I'm already editing AbstractBaseService. I think not having an input
file could be a valid use case, so let me adjust the schema and the
code to allow that. I'll simply pass a null into the executeService()
method.

David

David Kavanagh

unread,
Aug 12, 2008, 2:44:24 PM8/12/08
to lifegu...@googlegroups.com
I have this feature put in, and will be testing more before I commit
it. I have about 4 things that I've changed and need to test properly.

David

shlomo....@gmail.com

unread,
Aug 12, 2008, 4:28:22 PM8/12/08
to lifegu...@googlegroups.com
Thanks!

Does it also support the case of having no output files? My service
performs offline housekeeping tasks for my S3-hosted files - e.g.
deleting unneeded objects - which requires no input (besides the
filename passed as a param) and produces no output.

Actually, I'm thinking that it might be more correct to send a status
report "file" as text/plain output, a few lines indicating the
success/failure of the operation.

Still, it may be a valid use case to have a service that produces no output.

.. Shlomo

Chris Liebman

unread,
Aug 12, 2008, 4:51:10 PM8/12/08
to lifegu...@googlegroups.com
No output already works, or I have hacked it and not sent a
patch.... ;-) We have a chain of 3 services, the last of witch loads
data into a DB and has no output.

-- Chris

David Kavanagh

unread,
Aug 12, 2008, 4:52:42 PM8/12/08
to lifegu...@googlegroups.com
Yes, I'm pretty sure no output works fine. :-)

shlomo....@gmail.com

unread,
Aug 12, 2008, 5:01:13 PM8/12/08
to lifegu...@googlegroups.com
Cool. Thanks.

I also just realized that the params in the request are persisted
across from this service to the subsequent services in the workflow --
including modifications made by this service. That's great, and using
this I can chain together a few zero-input services based solely on
the params. This is really cool.

David Kavanagh

unread,
Aug 12, 2008, 5:07:58 PM8/12/08
to lifegu...@googlegroups.com
:-) Glad you like the design! I intended for all services to be able
to see parameters passed in the initial request.

David

shlomo....@gmail.com

unread,
Aug 13, 2008, 5:15:23 AM8/13/08
to lifegu...@googlegroups.com
> Yes, I'm pretty sure no output works fine. :-)

When I return a null List<MetaFile> from executeService I get a
NullPointerException, caused by the following code in
AbstractBaseService:

List<MetaFile> results = executeService(inputFile, request);
inputFile.delete();
logger.debug("service produced "+results.size()+" results");

An obvious workaround is to return an empty List, but I think it's
"cleaner" to support a null return value as well.

Can this be changed as well?

David Kavanagh

unread,
Aug 13, 2008, 8:57:17 AM8/13/08
to lifegu...@googlegroups.com
I just added a check for null before deleting the input file, and
changed the debug output line to look like;

logger.debug("service produced "+((results==null)?0:results.size())+"
results");

I'll be running this more today.. I got hung-up trying to test
yesterday.. not much progress.

David

shlomo

unread,
Aug 13, 2008, 1:04:56 PM8/13/08
to lifeguard-dev
I think IngestorBase will also need to change to support workflows
whose first service requires no files. My workflow does not require
any file to be ingested, just some parameters set on the WorkRequest
(i.e. DeleteFileService gets the "delFile.s3Bucket" and
"delFile.s3ObjectKey" param set). IngestorBase as it is now does not
send a WorkRequest at all if there are no files.

And, can you please tell me if the recommended way to set request-
specific params is to set their values on the Services that comprise
the Workflow before passing the Workflow to IngestorBase's contructor?
Like this:

// just like in FileIngestor
Workflow workflow = JAXBuddy.deserializeXMLStream(Workflow.class, new
FileInputStream(args[4]));

// now set the params needed by DeleteFileService
ParamType bucketParam = new ParamType();
bucketParam.setName("delFile.s3Bucket");
bucketParam.setValue("mystoragebucket");
// the first service is the DeleteFileService
workflow.getServices().get(0).getParams().add(bucketParam);

// now the workflow is ready to be passed to IngestorBase's
constructor

Is the above the recommended way to set request-specific params?

Actually, from looking at IngestorBase's code, it doesn't seem to
matter which Service in the Workflow you add the param to, since they
are collected together into the WorkRequest....

Thanks.

.. Shlomo

On Aug 13, 3:57 pm, "David Kavanagh" <dkavan...@gmail.com> wrote:
> I just added a check for null before deleting the input file, and
> changed the debug output line to look like;
>
>         logger.debug("service produced "+((results==null)?0:results.size())+"
> results");
>
> I'll be running this more today.. I got hung-up trying to test
> yesterday.. not much progress.
>
> David
>

shlomo....@gmail.com

unread,
Aug 14, 2008, 7:34:12 AM8/14/08
to lifeguard-dev
Upon further reflection, in order to support file-less workflows, I
think a new class PropertiesIngestor is needed, one that supports
ingesting a set of request-specific parameters. Something like this:

-----
public void ingest(List<Properties> properties) {
/* main body of this method is the same as in
IngestorBase.ingest(List<File>) method, up until the for loop over the
Files */

for (Properties props : properties) {
long startTime = System.currentTimeMillis();
wr.setInput(null);
// set the WorkRequest params
for (ParamType param : wr.getParams()) {
String paramKey = param.getName();
boolean hasPropertySet = props.containsKey(paramKey);
if (hasPropertySet) {
param.setValue(props.getProperty(paramKey));
}
}

/* and the rest of the method is also the same */
/* actually, we need to remove references to File variables, but
those are obvious */
-----

The idea is that each Properties contains the params as they should be
set for one specific request.

For example, a DeleteFileService requires no input files, only two
params: the S3 bucket name and the object key. The PropertiesIngestor
can be used as follows:

-----
PropertiesIngestor ingestor = new PropertiesIngestor(/* whatever */);

Properties deleteFileRequestProperties = new Properties();
deleteFileRequestProperties.setProperty("delFile.bucket", "mybucket");
deleteFileRequestProperties.setProperty("delFile.objectKey",
"objectWaitingForDeletion");

List<Properties> propsList = new LinkedList<Properties>();
propsList.add(deleteFileRequestProperties);

ingestor.ingestProperties(propsList);
-----

Then, the DeleteFileService is informed of which file to delete by
getting these params' values in the standard way.

What do you think of this idea?

[BTW, In the current code base it is somewhat inconvenient to subclass
IngestorBase and provide a new ingesting method because all
IngestorBase's members are private.]

.. Shlomo

David Kavanagh

unread,
Aug 20, 2008, 11:55:00 AM8/20/08
to lifegu...@googlegroups.com
Let me go look at that. I actually need to look at that now, so the
timing is good. I have a need to ingest with files already in S3, so
it doesn't need to move anything, just send messages. So, we have 3
use cases.

David

David Kavanagh

unread,
Aug 20, 2008, 1:12:56 PM8/20/08
to lifegu...@googlegroups.com
BTW, creating methods in the IngestorBase that look like this;

/**
* This method takes a list of properties submits work requests for
each. This is designed
* for workflows that don't require an initial input file, rather
some properties unique to
* each request define the inputs.
*
* @param properties the list of properties to use
*/


public void ingest(List<Properties> properties) {
}

/**
* This method takes a list of keys of objects in S3 and submits work
requests for each.
*
* @param files the list of files to ingest
*/
public void ingest(List<String> keys) {
}

/**
* This method takes a list of files, moves the files to S3 and
submits work requests for each.
*
* @param files the list of files to ingest
*/
public void ingest(List<File> files) {
}

I'm re-factoring the code so there is as much re-use as possible
between these methods.

David

On Thu, Aug 14, 2008 at 7:34 AM, <shlomo....@gmail.com> wrote:
>

shlomo....@gmail.com

unread,
Aug 20, 2008, 1:24:23 PM8/20/08
to lifegu...@googlegroups.com
I think there is opportunity here to abstract out common code from
AbstractBaseService and IngestorBase. They both perform the same task
"prepare message for the next (first) service in the workflow", and
both need to handle the three abovementioned cases, where the next
service requires Files or S3 keys or Properties.

dkav...@gmail.com

unread,
Aug 28, 2008, 6:47:57 PM8/28/08
to lifeguard-dev
I haven't re-factored the code, but I did get a lot of functionality
into the commit I just did. I think there are enough subtle
difference between ingest and the base service class that re-factoring
wasn't and obvious choice.

David

On Aug 20, 1:24 pm, shlomo.swid...@gmail.com wrote:
> I think there is opportunity here to abstract out common code from
> AbstractBaseService and IngestorBase. They both perform the same task
> "prepare message for the next (first) service in the workflow", and
> both need to handle the three abovementioned cases, where the next
> service requires Files or S3 keys or Properties.
>

shlomo

unread,
Aug 31, 2008, 1:23:40 PM8/31/08
to lifeguard-dev
The latest svn version has new methods on IngestorBase, all forwarding
to an internal protected method ingest(List<MetaFile> files,
List<Properties> properties, Map<String, String> outputKeys).

But the implementation of that method ignores the List<Properties>
parameter.
This implementation is a key part of supporting the Properties-based
ingestion.

I imagine the List<Properties> would be handled in a foreach
statement. Something that, for each Properties, fires off a new
WorkRequest with the List<ParamType> params containing the key-value
pairs in the Properties object in addition to (and overriding) the
params that were accumulated from the services.

I can workaround this by overriding the ingest method in my derived
class.

Will you be fixing this issue? Should I report a bug?

On Aug 13, 3:57 pm, "David Kavanagh" <dkavan...@gmail.com> wrote:
> I just added a check for null before deleting the input file, and
> changed the debug output line to look like;
>
>         logger.debug("service produced "+((results==null)?0:results.size())+"
> results");
>
> I'll be running this more today.. I got hung-up trying to test
> yesterday.. not much progress.
>
> David
>

David Kavanagh

unread,
Aug 31, 2008, 1:53:23 PM8/31/08
to lifegu...@googlegroups.com
Oops. I may have forgotten to wire that up, but I agree that it is
important. Let me fix that!

dkav...@gmail.com

unread,
Sep 5, 2008, 8:55:39 AM9/5/08
to lifeguard-dev
Shlomo, I just commited some code that ought to work for you (r84 in
SVN). Please let me know if there's a problem. I'd also be happy to
accept patches if you have fixes to include. I'm pretty busy and just
haven't had the time to test this fully.

David

On Aug 31, 1:53 pm, "David Kavanagh" <dkavan...@gmail.com> wrote:
> Oops. I may have forgotten to wire that up, but I agree that it is
> important. Let me fix that!
>

shlomo

unread,
Sep 8, 2008, 3:57:04 PM9/8/08
to lifeguard-dev
David - thanks! This commit indeed allows me to rely on IngestorBase
completely in my use case of ingesting Properties.

Thanks!

.. Shlomo
Reply all
Reply to author
Forward
0 new messages