Hi Justin,
Glad you like Jug!
> Something that I'd like to do is take many computation results (more data than will fit in memory) and one by one write them to a single file (HDF5 in my case).
>
> The naive solution was to write something like:
>
> def write_stuff(task_results, output_file):
> for task_result in task_results:
> output_file.write(task_results)
>
> return output_file
There is an alternative, which is to "look under the hood" and use jug.io.NoLoad (this is not documented, but it should work).
Assuming your script looks something like:
...
write_stuff(task_results, 'output.hdf5')
You transform it to:
from
jug.io import NoLoad
task_results_no_load = [NoLoad(r) for r in task_results]
write_stuff(task_results_no_load, 'output.hdf5')
Now, `write_stuff` will be called with the Task objects, so you need to load/unload explicitly:
@TaskGenerator
def write_stuff(task_results, output_file):
for task_result in task_results:
val = task_results.load()
output_file.write(val)
task_results.unload()
HTH
Luis