Hello,
The upcoming release 0.22.0 will bring some long-awaited improvements in the nextflow caching mechanism.
It will include two new commands: `log` and `clean`.
The `log` command replaces the `history` command that will be deprecated.
In the simplest form the `log` command just prints the list of executed pipelines in the current folder. For example:
$ nextflow log
TIMESTAMP RUN NAME SESSION ID COMMAND
2016-08-01 11:44:51 grave_poincare 18cbe2d3-d1b7-4030-8df4-ae6c42abaa9c nextflow run hello
2016-08-01 11:44:55 small_goldstine 18cbe2d3-d1b7-4030-8df4-ae6c42abaa9c nextflow run hello -resume
2016-08-01 11:45:09 goofy_kilby 0a1f1589-bd0e-4cfc-b688-34a03810735e nextflow run rnatoy -with-docker
Since this version to each pipeline execution is assigned a unique `name` that will help to identify multiple runs of a pipeline. This name is automatically generated by nextflow or provide by the user on the run command line.
By using the run name (or even the session id) it is possible to inspect the tasks that have been executed by that pipeline run. For example:
$ nextflow log goofy_kilby
/Users/../work/0b/be0d1c4b6fd6c778d509caa3565b64
/Users/../work/ec/3100e79e21c28a12ec2204304c1081
/Users/../work/7d/eb4d4471d04cec3c69523aab599fd4
/Users/../work/8f/d5a26b17b40374d37338ccfe967a30
/Users/../work/94/dfdfb63d5816c9c65889ae34511b32
By default the `log` command prints the task execution paths. However by using the `-f` command line option it is possible to provide a custom list of fields to be printed. For example:
$ nextflow log goofy_kilby -f hash,name,exit,status
0b/be0d1c buildIndex (ggal_1_48850000_49020000.Ggal71.500bpflank) 0 COMPLETED
ec/3100e7 mapping (ggal_gut) 0 COMPLETED
7d/eb4d44 mapping (ggal_liver) 0 COMPLETED
8f/d5a26b makeTranscript (ggal_liver) 0 COMPLETED
94/dfdfb6 makeTranscript (ggal_gut) 0 COMPLETED
The fields accepted by the `-f` options are the ones included in the
trace report, plus the following: script, stdout, stderr, env.
A user can further customise the printed log by using the `-t` option which allows a template (string or file) to be specified. This makes it possible to create complex custom report in any text based format. For example you could use the following markdown snippet saving it to a file:
## $name
script:
$script
exist status: $exit
task status: $status
task folder: $folder
then, the following command will output a markdown file containing the script, exit status and folder of all executed tasks:
The `filter` option makes it possible to select which entries to be include in the log report. Any valid groovy boolean expression on the log fields can be used to define the filter condition. For example:
nextflow log goofy_kilby -filter 'name =~ /foo.*/ && status == "FAILED"'
The `clean` command allows you to delete cached work directories by specifying the run name or a session id. For example:
$ nextflow clean goofy_kilby
The above command will delete task directories for the run with name `goofy_kilby`. The special name `last` can be used to delete the tasks or the last pipeline run.
The options -before, -after, -cut can be used to specify a set of runs to delete. For example:
$ nextflow clean -before last
Finally, the
trace report will include also an entry for each cached task that was included in a pipeline execution. For example:
task_id hash tag name attempt status exit
1 70/84f82a - sayHello (1) 1 CACHED 0
4 21/3b7aca - sayHello (4) 1 CACHED 0
3 b3/b67279 - sayHello (3) 1 CACHED 0
2 5e/4479aa - sayHello (2) 1 COMPLETED 0
Thought I'm not sure that in this report the status field should be reported as 'CACHED' as in this example or 'COMPLETED', adding a new column `cached` reporting true|false.
You can try these new features by defining in the following environment variable:
export NXF_VER=0.22.0-SNAPSHOT
Comments are welcome.
Cheers,
Paolo