Hi David,
> My use case is an arbitrary parallel execution of independent matlab jobs.
> I can write the main Application object and things like that, but my use
> case would require submitting the jobs inside a python object. Keep this
> object around and query completed tasks once their done programmatically.
> And keep track of jobs that have erred or encountered other problems.
I would do it like this, in (incomplete Python) code::
```
from gc3libs import ANY_OUTPUT, Application, create_engine
from gc3libs.workflow import ParallelTaskCollection
# 1. create an `Engine` object to run tasks
engine = create_engine()
# 2. create applications to process all input files
input_files = ['file1', 'file2', ...]
apps = [
Application(
['process', basename(filename)],
inputs=[filename],
outputs=ANY_OUTPUT,
output_dir=(),
...
) for filename in input_files]
# 3. bundle them all in a `ParallelTaskCollection`
top = ParallelTaskCollection(apps)
# 4. run the task collection through the Engine
engine.submit(top)
while top.execution.state != 'TERMINATED':
engine.progress()
# 5. collect tasks that errored out
failed = [task for task in top.tasks if task.execution.returncode != 0]
```
Use of the `ParallelTaskCollection` is actually optional: you could
submit all the tasks in a `for`-loop but then you would have to check
the status of each task individually. E.g., you could replace 3.
with:
```
for app in apps:
engine.submit(app)
while done < len(app):
engine.progress()
done = len([app for app in apps if app.execution.state == 'TERMINATED'])
```
(In my opinion this latter code would only make sense if you have to
break out of the loop earlier, e.g., when 80% of the tasks are
terminated successfully, or when a few critical ones have terminated
unsuccessfully...)
Does this answer your question?
Ciao,
R