multiple files (but different pattern)

1,577 views
Skip to first unread message

RFenouil

unread,
Jan 5, 2018, 10:42:44 AM1/5/18
to Nextflow
Hello,

I'm back with another basic question.
I would like to pass a file or a list of files (let's say fasta) as parameter of a workflow.

I started with :
  optionalFile=Channel.fromPath(params.optionalFile)

I can therefore start the workflow using:
  nextflow myscript.nf --optionalFile="test.fasta"

I can also feed the channel with multiple files as you suggest in the documentation using something like that:
  nextflow myscript.nf --optionalFile="*.fasta"

However I don't understand how to allow the user to feed the channel with multiple fasta files using full path/names, or let's say multiple fasta files stored in different folders.
Something like that:
  nextflow myscript.nf --optionalFile="A.fasta B.fasta ../other/C.fasta"

Can you please help me with this concern ?

Thank you very much (& happy new year ;)) !

Paolo Di Tommaso

unread,
Jan 8, 2018, 3:42:01 AM1/8/18
to nextflow
Hi, 

Channel.fromPath allows the usage of glob patterns such as `{A,B}.fasta` or `/base/path/**/{A,B,C}.fasta`. The latter traverses the directory tree and returns all files matching A,B and C.fasta names. 

Handling explicit different folder it's not supported by the `fromPath` function. A workaround could be the following: 


Channel
   .from( params. optionalFile.tokenise() )  // split the parameter in separate file names
   .flatMap{ files(it) }                     // converts the file path string to a file objects  
   .set{ new_channel }                      // creates a new channel  


Hope it helps

Cheers,
Paolo


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

RFenouil

unread,
Jan 8, 2018, 5:33:36 AM1/8/18
to Nextflow
Dear Paolo,

thank you very much for your help !
Your "tokenize()" solution is definitely what I was looking for.

I actually did not understand that using the "files()" method would result in a similar channel structure as using the "fromPath" factory.
BTW, is "files()" method something that you specifically implemented in nextflow or something native to groovy ?

Another question I have is what happen to a file input (in a process) when an invalid file reference is given ?
Here is an example :

optionalFile=Channel.empty()

process P01_test {
  input:
    file fileToProcess from optionalFile.ifEmpty{222}

  output:
    stdout result

  script:
    fileName=fileToProcess.getName()
    """
    echo ${fileName}
    """
}

result.subscribe{ println(it) }

The script shows a filename set to "input.1" for this "invalid file".

Beside trying to understand how things work internally, my goal is to allow the user to provide a filename (or several) as argument of my script. Then the process will use these files if provided (all at once, ie. with collect), or do something a bit different when no file has been specified. My problem is to detect when no file has been given.
I see several ways of doing that (checking files existence with groovy calls or even bash) but I would like to know what would be the "best practice" for a clean nextflow  implementation.

All the best,


Romain.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

RFenouil

unread,
Jan 8, 2018, 11:14:00 AM1/8/18
to Nextflow
Here is what I came with, which gives the expected behavior:



// Parameters default values
params.optionalFiles=false

// Channel creation
optionalFiles=Channel.empty()

// Fill channel with file(s) (exits if file does not exist)
if(params.optionalFiles){
    optionalFiles=Channel.from(params.optionalFiles.tokenize()).flatMap{if(!(f=file(it)).exists()) exit(1, "Parameter 'optionalFiles': file '${f.getName()}' does not exist"); return(f)}
}

process P01_test {
  input:
    file fileToProcess from optionalFiles.ifEmpty{'any invalid file ID'}.collect() // replace by any integer value ?

  output:
    stdout result

  script:
    if(fileToProcess.every{it.exists()}){
    """
    echo 'Starting process using file(s) : ${fileToProcess.each{it.getName()}.join(' -&- ')}'
    """
    } else {
    """
    echo 'Starting process without reference file'
    """
    }
}

result.subscribe{ println(it) }



Supposing 'test.txt' and 'test.nf' files exist in the directory:

>nextflow myScript.nf --optionalFiles 'test.txt test.nf'
[56/537220] Submitted process > P01_test
Starting process using file(s) : test.txt -&- test.nf

>nextflow myScript.nf --optionalFiles 'test.txt lala.txt'
ERROR ~ Parameter 'optionalFiles': file 'lala.txt' does not exist

>nextflow myScript.nf
Starting process without reference file


This is the best I could think of with my limited knowledge but I'm sure there is room for optimization and a cleaner implementation.
Could you please comment anything that could be made better on this piece of code (specially the file handling mechanism) ? I am still learning and any comment is appreciated :)

I feel that the "ifEmpty" in input channel is too indirect approach...

Thank you !

Romain.

Paolo Di Tommaso

unread,
Jan 9, 2018, 7:07:53 AM1/9/18
to nextflow
If can simplify the code using a dummy file variable that will work as an empty value. For example: 


params.optionalFiles = ''

NO_FILE = file('DUMMY').fileName
optional_ch = ( params.optionalFiles 
            ? Channel.from(params.optionalFiles.tokenize()).flatMap{ files(it) }.filter { it.exists() }.collect() 
            : NO_FILE ) 

process foo {
  
  input: 
  file fileToProcess from optional_ch
  
  script: 
  if( fileToProcess==NO_FILE ) 
  """
  echo commannd_without_optional_file
  """
  
  else 
  """
  echo command_with_optional --files $fileToProcess
  """

}


Hope it helps. 


Cheers,
Paolo


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

RFenouil

unread,
Jan 9, 2018, 8:41:36 AM1/9/18
to Nextflow
Thank you.

These files are supposed to be sample independent and don't need to be consumed, is it necessary to create a channel ?
Is something like that possible ?


    NO_FILE = file('DUMMY').fileName
    optionalFiles = ( params.optionalFiles ? params.optionalFiles.tokenize().flatten{file(it).exists()? file(it):exit(1,"file ${it} does not exist")}
                : NO_FILE )


However in this case, it looks like 2 processes (foo) are started, I don't understand why.

Paolo Di Tommaso

unread,
Jan 9, 2018, 8:56:08 AM1/9/18
to nextflow
If you provide a simple value, a process implicitly creates a channel. 

I won't use a exit there. Use a filter as I showed in the previous example, eventually printing a warning message. 


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

RFenouil

unread,
Jan 9, 2018, 10:02:47 AM1/9/18
to Nextflow
Ok, thank you.

Can you explain why the is the exit statement problematic ?

Is there in the documentation a paragraph that explains how the implicit channel conversion from variable works ?

Specially in these cases :

variable=1 // Binds to a single value, equivalent to Channel.value(1) ?
variable=[1] // Value is consumed, equivalent to Channel.from(1) ?
variable=[1,2] // Same as previous, equivalent to Channel.from(1..2) ?

Do you recommend always define channels instead of relying on implicit creation (so we can eventually control emission with 'collect) ?

RFenouil

unread,
Jan 10, 2018, 8:04:09 AM1/10/18
to Nextflow
Can you also point me to the definition or documentation for file() method/class please ?
I don't understand what physically goes in the channel (only the unix path ?), and what your "fileToProcess==NO_FILE" comparison actually acts on ?

Paolo Di Tommaso

unread,
Jan 10, 2018, 8:52:41 AM1/10/18
to nextflow
The `file/s` method creates an absolute Path object given the specified file path (or glob). 

See the documentation here. If you are interested in the implementation look here


Hope it helps. 

p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

RFenouil

unread,
Jan 10, 2018, 10:36:38 AM1/10/18
to Nextflow
Ok that's great !

Sorry if that is a bit off-topic but where does the "complete()" method you are using come from ? I cannot find it documented anywhere...

More importantly, can you confirm if I understood correctly why you use    
    NO_FILE = file('DUMMY').fileName
instead of
    NO_FILE = file('DUMMY')

Is it because during staging files change path, therefore you have to compare only the file name ?

Paolo Di Tommaso

unread,
Jan 10, 2018, 11:31:54 AM1/10/18
to nextflow
The `file` method returns the file file `Path`, instead in the process context you get a file name only Path for each input. 

Therefore `fileToProcess==NO_FILE` would be false. For this reason you the `.fileName` allows you to get a file that matches with the once created by the process. 

The `complete` method is a NF internal extension method. You can find it here.  


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

RFenouil

unread,
Jan 11, 2018, 2:29:45 AM1/11/18
to Nextflow
Great, everything is clear now :)
Thank you very much for your answers.

Romain.
Reply all
Reply to author
Forward
0 new messages