Generate output files in exec block

883 views
Skip to first unread message

Lukas Jelonek

unread,
Mar 18, 2015, 5:41:33 AM3/18/15
to next...@googlegroups.com
Hey everybode,

I encountered the following problem: I wanted to create a text file in an exec block like this:

process createFile {

    output:
    file 'file' into files

    exec:
    new File('file').withWriter{
        it.println 'hello world'
    }

}

I get the following output:

N E X T F L O W  ~  version 0.12.4
[warm up] executor > local
[54/d1583d] Submitted process > createFile (1)
Error executing process > 'createFile (1)'

Caused by:
  Missing output file(s): 'file' expected by process: createFile (1)


Source block:
     new File('file').withWriter{
  it.println 'hello world'
     }
The file was not created in the process working directory, but in my current working directory.

As a workaround I used a script block with the specified groovy interpreter:

process createFile {

    output:
    file 'file' into files

    script:
    '''
    #!/usr/bin/env groovy
    new File('file').withWriter{
        it.println 'hello world'
    }
    '''

}

Here the file is created in the process working directory and everything works fine.

So my question is: What would be the right way to generate a file in an 'exec:' block? Or should it be avoided?

Greetings,
Lukas

SveinT

unread,
Mar 18, 2015, 5:59:31 AM3/18/15
to next...@googlegroups.com
Hi,

as you can see here, script/exec are interchangable and optional (if I understood correctly). So you may just leave it out.

Hence any code outside of the ''' ''' block is actually running inside the Nextflow process. Anything inside the ''' ''' block is ran as a separate process, either on your local system or on a compute cluser. What nextflow does is it takes the part inside the ''' ''' block and creates a shell script from it behind the scenes (have a look inside the work folder to understand better).

So it really boils down to what you want to do. If you need to create a file before the process starts, you put it outside the whole process {} block, i.e a the top of your nextflow file. If you want to create a file as part of your process, you put it inside the """ """ block, like in your last example. Note that you don't have to use groovy code in there, it could just as well be "echo "Hello World" > somefile".

Hope this helps!

SveinT

unread,
Mar 18, 2015, 6:03:04 AM3/18/15
to next...@googlegroups.com
Just a follow up. Your question sort of implies a lack of general understanding of how Nextflow works, so I would advise you to go through the examples and documentation thoroughly before continuing. 

Lukas Jelonek

unread,
Mar 18, 2015, 6:34:30 AM3/18/15
to next...@googlegroups.com
Hey SveinT,

thanks for your answer, although I am not yet sure if it helps ;)


Am Mittwoch, 18. März 2015 10:59:31 UTC+1 schrieb SveinT:
as you can see here, script/exec are interchangable and optional (if I understood correctly). So you may just leave it out.

If they were interchangeable then there shouldn't be any problem in using either one of them, but obviously the exec: environment has different semantics.
 
Hence any code outside of the ''' ''' block is actually running inside the Nextflow process. Anything inside the ''' ''' block is ran as a separate process, either on your local system or on a compute cluser. What nextflow does is it takes the part inside the ''' ''' block and creates a shell script from it behind the scenes (have a look inside the work folder to understand better).

That is clear to me. My problem is that I assumed that when I am in a process block I am working in a separate working directory for each execution of the process and the environment is setup such that this working directory is the current working directory. This works fine for script: blocks, but not for exec: blocks. For the exec: block the current working directory is the directory from which I started the nextflow workflow.
 
So it really boils down to what you want to do. If you need to create a file before the process starts, you put it outside the whole process {} block, i.e a the top of your nextflow file. If you want to create a file as part of your process, you put it inside the """ """ block, like in your last example. Note that you don't have to use groovy code in there, it could just as well be "echo "Hello World" > somefile".
 
The hello world example is just an example to boil down to the problem. Actually I have a slightly more complex groovy script that prepares some data before handling them over to other processes. 

My conclusion for now is to use script: when I want to work in the process work directory with a correctly setup environment and to use exec: when I do some computation that does not create any files as the working directory is setup in a different fashion.

Greetings,
Lukas

Paolo Di Tommaso

unread,
Mar 18, 2015, 6:53:46 AM3/18/15
to nextflow
Hi Lukas, 

You are right, actually this is a know problem due to the fact that piece of code will be executed in the same JVM instance as the main process, being so it's not possible to change the current working directory to the task working directory. 

I'm still struggling to find a solution to that. Up to now it will require to execute that code in a separate JVM instance but that would add too much overhead to short-lived processes. 

The current solution is creating that file relative to task working directory accessible through the variable "task.workDir". For example 


process foo {
  output:
  file 'file' into files
  
  exec: 
  task.workDir.resolve('file').text = 'Hello world!'
}

files.subscribe { println it.text }


Also note that underlying files are implement as Path objects not File


 

Enjoy. 

p


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

SveinT

unread,
Mar 18, 2015, 6:57:23 AM3/18/15
to next...@googlegroups.com
Hi again,

sorry, it's apparently me being a bit confused, not you!

I took it from the documentation that they were equal (and optional due to the [ ]), since I thought I had used it like that before, but I was wrong.

The script statement is indeed optional though, if you provide a """ """ block.

I guess the exec statement is only useful when running things locally and not outputting files as you said. It doesn't create any .command.sh files like script processes, so I assume it just forks a new process inside the Nextflow process. I guess Paolo can clarify regarding when it's useful.

Using 'script' sounds like the correct way.

Lukas Jelonek

unread,
Mar 18, 2015, 7:18:10 AM3/18/15
to next...@googlegroups.com
Hey Paolo,

thanks for the answer. This is exactly the information I needed :) Maybe this information can be added to the documentation.

I can't image a solution to this problem either. If it would be single threaded I would suppose to simply change the directory, but I assume that this won't work and that you will need to handle it seperately for each thread.

Thanks so far,
Lukas

Paolo Di Tommaso

unread,
Mar 18, 2015, 9:07:17 AM3/18/15
to nextflow
Hi, 

The process exec is supposed to be used when it is need to run Java/Groovy code in a parallel manner, though the real goal would being able to distribute the execution in the cluster in the same way as process scripts. 

The problem is that generally Nextflow is used to run pipelines with a cluster manager, like SGE or LSF.

In this kind of environment distributing native code is quite useless because the scheduling overhead would be longer than the actual execution time of the task.

For this reason, process native code currently is execute locally even when a grid executor is used. Currently, the only executor that is able to run native process code in a distributed manner is the GridGain (experimental) executor. 


To sum up, process exec can be useful to run some native code in a parallel manner, but it is a feature that has to be considered experimental.


Cheers,
Paolo
  
 

Reply all
Reply to author
Forward
0 new messages