Optional output ports

20 views
Skip to first unread message

Alexander Kanitz

unread,
Apr 15, 2015, 5:36:08 AM4/15/15
to andur...@googlegroups.com
Dear Anduril developers,

we are currently implementing an Anduril-based general purpose bioinformatics analysis pipeline for use in our lab. For this, I am currently writing a 'component generator' script, reading inputs, outputs and other parameters from a CSV file and generating the XML descriptor file, as well as a sort of one-size-fits-all Python wrapper that includes a command template, performs input/output/parameter validation (to some extent), renders and sanitizes the command template and executes it either locally or remotely (DRMAA & Univa/Oracle/Sun Grid Engine) given a set of mandatory parameters for handling resources (CPUs, memory etc) with fairly detailed error-handling. So far it's working quite nicely, with one annoying exception:

Anduril does not seem to allow/handle *optional* output ports. While I can imagine that this is to reduce the chance of building workflows where components rely on inputs that are never produced (not even empty files), this is still quite a limitation, because there are indeed scripts (my first real test component 'CutAdapt' for example) that not only have several optional outputs, but also automatically set and/or require certain settings when a given output option is set (e.g. when specifying an output option for paired-end results, the script requires a second input file and sets, internally, certain flags). But since Anduril always produces arguments to all output ports (by default: executionDir + outputPortName), it is not really possible to dynamically determine whether an optional output port was specified in the workflow (and thus the corresponding option should be kept in the command template) or not (in which case the whole option, including argument should be removed).

My current workaround is to produce for each optional output (I allow and require certain other attributes that are not written to the XML, but are contained in a third JSON output file that our pipeline backend can deal with) a boolean parameter that can be set in the workflow and interrogated when rendering/sanitizing the command template such that the optional output port option/argument pair is only included in the command when the corresponding boolean parameter is true. This works, but is tedious and ugly. Therefore my question is whether there is some more elegant solution for this? And if not, my suggestion would be to support optional outputs in future releases.

Talking about the future: I was getting a bit worried about the recent lack of Anduril news and updates and was thus excited to hear, in another question, that Anduril 2.x is on its way! Is there a chance you could supply a little outlook on new functionality as a teaser (perhaps better in a separate thread or in the news section)? This might allow me and other users to bear in mind the coming changes when developing for projects relying on Anduril.

Thanks for your help and kind regards,
Alex

PS: I have seen here in the questions something on a 'component-generator', but couldn't find neither the executable nor any in-depth documentation...

Ville Rantanen

unread,
Apr 15, 2015, 5:57:04 AM4/15/15
to andur...@googlegroups.com
Hi, and thanks for the interest! 

About the optional output ports, the current design (and Anduril2) will not really allow this. In the cases where we require something similar, we typically use array outputs.
A component can define that its output array always has a key named "something", and in optional cases, other keys may be present. 
If the output array key is missing, anduril will give you an error if you try to use it in a downstream component. 

Something like:

comp=ComponentGeneratingOptionalOutputs( input=data, method="justTheDefault" )

other
=SomeOtherComponent( input=comp.output["default"] )
secondOther
=SomeOtherComponent( input=comp.output["optional"] )

Here, if the "optional" key is not found in the output array, an error is raised.


There are components, like GNUPlot that use similar approach. however, in that component the user himself can choose the names of the output keys. 

To answer the second question about the future:     The building of the second generation of Anduril has been on it's way for a year now, and it is already in a usable state.
However, we still expect some quite core related changes, and therefore haven't expressed much of the workings, because it's not entirely stable yet. 

In short, the major change is that the AndurilScript language is gone. We wanted to give the user the freedom of using a full programming/scripting language with full capability of using maths/string manipulation etc. without having to implement everything ourselves.  The pipelines in Anduril 2.x will be constructed in Scala. 
Using an external scripting language we have to limit the core functionality a little bit, but then, the user can do a real variety of pretty much anything during the building of the component network.


the component-generator you speak of may be obsolete with Anduril2, since you can actually introduce new components inline within the pipeline.
Reply all
Reply to author
Forward
0 new messages