Having one input file with a list of files

758 views
Skip to first unread message

manona martin

unread,
Apr 16, 2018, 8:52:50 AM4/16/18
to Nextflow
I'm just beginning and I'm stuck with this problem...

I'd like to have one file in entry, which contains a list of other file names.
Example : "nextflow project.nf --file file.lst ..."
So in this example, "file.lst" contains a list of three files : 31_pep.fas, 32_pep.fas, 33_pep.fas.


What I'd like to do is iterate on each element on the list (I'm applying Muscle and Mafft on them). So here what it looks like for now :


listSeq=params.file

Channel.fromPath(listSeq)
        .splitText()
        .map { file(it) }
        .set { file_list }


process listAlign {
    input :
    file seq from file_list
   
    output:
    file "*.aln" into listeAln, listeAln2, listeAln3, listeAln4, listeAln5
    stdout out_list
       
    script:
    print "${seq}" > out_list
    if( mode == 'muscle' || mode == 'mafft' || mode == 'clustalo' || mode == 'all' ){
       
        template "${mode}list_command.sh"   
           
    }else
        error "Invalid alignment mode: ${mode}"

}


My template musclelist_command.sh contains :

#!/bin/bash
echo "process started at `date`"
muscle -in ${seq} -out tmp_out.aln
t_coffee -other_pg seq_reformat -in tmp_out.aln -output phylip -out ${seq} _AS_${mode}.aln
echo "Done !"

And it just doesn't work ... And I don't know why ! If I could get some help, that would save my life
Thanks
Message has been deleted

manona martin

unread,
Apr 16, 2018, 8:57:30 AM4/16/18
to Nextflow
By the way, here's what I've got :
N E X T F L O W  ~  version 0.28.0
Launching `projetc.nf` [loving_nobel] - revision: 0e62dacb19
[warm up] executor > local
COG.lst
falsefalse[b4/8308a6] Submitted process > listAlign (1)
[24/8352ec] Submitted process > listAlign (2)
ERROR ~ Error executing process > 'listAlign (1)'

Caused by:
  Process `listAlign (1)` terminated with an error exit status (127)

Command executed [/home/Tests/templates/musclelist_command.sh]:


  #!/bin/bash
  echo "process started at `date`"
  muscle -in 31450_pep.fas
   -out tmp_out.aln
  t_coffee -other_pg seq_reformat -in tmp_out.aln -output phylip -out 31450_pep.fas
  _AS_muscle.aln
  echo "Done !"

Command exit status:
  127

Command output:
  (empty)

Command wrapper:
  .command.run: line 61: 31450_pep.fas: command not found


Steve

unread,
Apr 16, 2018, 1:46:06 PM4/16/18
to Nextflow
tl; dr: The problem does not appear to be coming from your input Channel, use `.splitCsv` instead.

I reproduced the code like this:

$ cat file
.lst
31_pep.fas
32_pep.fas
33_pep.fas

$ cat templates
/list_command.sh
#!/bin/bash

echo
"muscle -in ${seq} -out tmp_out.aln"

echo
"t_coffee -other_pg seq_reformat -in tmp_out.aln -output phylip -out ${seq}_AS_${mode}.aln"

touch
"${seq}_AS_${mode}.aln"

echo
"Done !"


$ cat main
.nf
listSeq
=params.file

Channel.fromPath(listSeq)
       
.splitText()
       
.map { file(it) }
       
.into { file_list; file_list2 }

file_list2
.subscribe { println "[file_list2] ${it}" }
methods
= ['muscle', 'mafft', 'clustalo', 'all']

process listAlign
{

    input
:
    file seq
from
file_list
    each mode
from methods

    output
:

    file
"*.aln" into listeAln, listeAln2, listeAln3, listeAln4, listeAln5
    stdout out_list

    script
:
   
print "${seq}" > out_list
   
if( mode == 'muscle' || mode == 'mafft' || mode == 'clustalo' || mode == 'all' ){


       
template "list_command.sh"


   
}else
        error
"Invalid alignment mode: ${mode}"

}



And I get the same error:


ERROR
~ Error executing process > 'listAlign (7)'

Caused by:
 
Process `listAlign (7)` terminated with an error exit status (127)

Command executed [/Users/kellys04/projects/nf-example1/templates/list_command.sh]:

 
#!/bin/bash

  echo
"muscle -in 32_pep.fas
   -out tmp_out.aln"


  echo
"t_coffee -other_pg seq_reformat -in tmp_out.aln -output phylip -out 32_pep.fas
  _AS_clustalo.aln"


  touch
"32_pep.fas
  _AS_clustalo.aln"


  echo
"Done !"

Command exit status:
 
127

Command output:
 
(empty)

Command wrapper:

 
.command.run: line 64: 32_pep.fas: command not found

Work dir:
 
/Users/kellys04/projects/nf-example1/work/73/7f4dd907c5d3c090ae8838559bb816

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 
-- Check '.nextflow.log' file for details
WARN
: Killing pending tasks (6)
make
: *** [run] Error 1




If you check the '.command.run' file you can see the source of the error:

$ cat
-n /Users/kellys04/projects/nf-example1/work/73/7f4dd907c5d3c090ae8838559bb816/.command.run
     
1    #!/bin/bash
     
2    # NEXTFLOW TASK: listAlign (7)
     
3    set -e
     
4    set -u
     
5    NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
     
6
...
...
...
   
59    [[ $NXF_SCRATCH ]] && echo "nxf-scratch-dir $HOSTNAME:$NXF_SCRATCH" && cd $NXF_SCRATCH
   
60    # stage input files
   
61    rm -f 32_pep.fas
   
62
   
63    ln -s /Users/kellys04/projects/nf-example1/32_pep.fas
   
64     32_pep.fas
   
65



'32_pep.fas' is being called in the script as if it was an executable. You can also see a discrepancy in the `.command.sh` file produced:

$ cat
/Users/kellys04/projects/nf-example1/work/73/7f4dd907c5d3c090ae8838559bb816/.command.sh
#!/bin/bash

echo
"muscle -in 32_pep.fas
 -out tmp_out.aln"


echo
"t_coffee -other_pg seq_reformat -in tmp_out.aln -output phylip -out 32_pep.fas
_AS_clustalo.aln"


touch
"32_pep.fas
_AS_clustalo.aln"


echo
"Done !"



Line breaks have been introduced. This is because you are using `splitText` to parse the file; this preserves the trailing line break. The solution is to instead use `splitCsv`, like this:

$ cat main.nf
listSeq
=params.file

Channel.fromPath(listSeq)
       
.splitCsv()
       
.map { row ->
           
def path = file("${row[0]}")
           
return path
       
}
       
.into { file_list; file_list2 }

file_list2
.subscribe { println "[file_list2] ${it}" }
methods
= ['muscle', 'mafft', 'clustalo', 'all']

process listAlign
{

    input
:
    file seq
from
file_list
    each mode
from methods

    output
:

    file
"*.aln" into listeAln, listeAln2, listeAln3, listeAln4, listeAln5
    stdout out_list

    script
:
   
print "${seq}" > out_list
   
if( mode == 'muscle' || mode == 'mafft' || mode == 'clustalo' || mode == 'all' ){


       
template "list_command.sh"


   
}else
        error
"Invalid alignment mode: ${mode}"

}



output:

$ make run
./nextflow run main.nf  --file file.lst

N E X T F L O W  
~  version 0.28.0
Launching `main.nf` [tiny_davinci] - revision: 43295cc830
[file_list2] /Users/kellys04/projects/nf-example1/31_pep.fas
[file_list2] /Users/kellys04/projects/nf-example1/32_pep.fas
[file_list2] /Users/kellys04/projects/nf-example1/33_pep.fas
[warm up] executor > local
falsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalse
[e4/6f17f8] Submitted process > listAlign (2)
[b5/847e2a] Submitted process > listAlign (1)
[fb/1a22e7] Submitted process > listAlign (4)
[16/8d36e1] Submitted process > listAlign (5)
[bb/4190dd] Submitted process > listAlign (3)
[db/ae889f] Submitted process > listAlign (6)
[a2/c62ef3] Submitted process > listAlign (7)
[e0/eae221] Submitted process > listAlign (8)
[cc/d897c5] Submitted process > listAlign (12)
[80/11bbde] Submitted process > listAlign (11)
[e7/0c7352] Submitted process > listAlign (10)
[b6/005163] Submitted process > listAlign (9)

Hope this helps.

Felix Kokocinski

unread,
May 29, 2018, 4:56:23 AM5/29/18
to Nextflow
Line breaks have been introduced. This is because you are using `splitText` to parse the file; this preserves the trailing line break. The solution is to instead use `splitCsv`, like this:

$ cat main.nf
listSeq
=params.file

Channel.fromPath(listSeq)
       
.splitCsv()
       
.map { row ->
           
def path = file("${row[0]}")
           
return path
       
}
       
.into { file_list; file_list2 }

file_list2
.subscribe { println "[file_list2] ${it}" }
methods
= ['muscle', 'mafft', 'clustalo', 'all']
process listAlign
{

    input
:
    file seq
from file_list
    each mode
from methods

    output
:
    file
"*.aln" into listeAln, listeAln2, listeAln3, listeAln4, listeAln5
    stdout out_list

    script
:
   
print "${seq}" > out_list
   
if( mode == 'muscle' || mode == 'mafft' || mode == 'clustalo' || mode == 'all' ){

       
template "list_command.sh"

   
}else
        error
"Invalid alignment mode: ${mode}"

}




This is an interesting approach!
Does anyone know how this will work if the executor is not 'local'? The output of the 'file_list' channel is text, not file handles, so I'm not sure if Nextflow will find the input files in a process running on another host? (In particular if the files are stored in sub-directories)

  Thanks, Felix 

Paolo Di Tommaso

unread,
May 29, 2018, 9:00:18 AM5/29/18
to nextflow
When the channel content is a text string and it's attached to an input declared as a `file`, it's automatically saved to a temporary file in the task work directory, therefore it's accessible to a remote host. 



--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Felix Kokocinski

unread,
May 29, 2018, 10:18:00 AM5/29/18
to next...@googlegroups.com
Beautiful!
 Thanks, F
You received this message because you are subscribed to a topic in the Google Groups "Nextflow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nextflow/cFzr30GkrGA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nextflow+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages