The output files are saved only to work/ dir even though publishDir is set

2,723 views
Skip to first unread message

guaira.mel...@gmail.com

unread,
Jun 26, 2018, 5:57:21 PM6/26/18
to Nextflow
Hello there !!

I am pretty much a novice in nextflow. I am trying to create a simple pipeline that execute fastq (quite simmilar to fastQC, widelly common in the bioinformatics community). Here is what I have done:

#!/usr/bin/env nextflow

params.fastq_files = '/home/tain/Documents/RNA_seq/Data/*.fastq'
params.outdir = '/home/tain/Documents/RNA_seq/QC/'

Channel
 
.fromPath( params.fastq_files )
 
.ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
 
.into { files_QC_ch }

process fastp
{
publishDir
params.outdir, mode: 'copy', pattern: '*.html'

  input
:
  file
(fastq_file) from files_QC_ch

  output
:
 
set file('fastp_*'), file('*.html') into fastp_results_ch

 
"""
  fastp -w 17 -i ${fastq_file} -o fastp_${fastq_file} -h ${fastq_file}_fastp.html
  """

}

However, the .html and fsatp_ files (the output) are not saved to the folder QC, instead they are saved to the work/process/ folder. What am I doing wrong? I have already look at other pipelines (https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf) but I still cannot figure it out why is this issue?

I would deeply appreciate any help.

Tain

Paolo Di Tommaso

unread,
Jun 27, 2018, 10:08:22 AM6/27/18
to nextflow
That files are not copied in the QC folder because in the publishDir you have specified `pattern: '*.html'`. Therefore only files with that suffix are *copied* there. 


p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

guaira.mel...@gmail.com

unread,
Jul 5, 2018, 12:56:08 PM7/5/18
to Nextflow
Hello.

May be I did not explain myself clear enough. The command fastp generates three output files (1 .html, 1.json and 1.fastq). What I want is to save only the .html files to a desired folder (so I specified the pattern "*.html" in the publishDir command), however the files are not copied there, instead all three files are saved in the work/process/ folder.

Tain.

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Message has been deleted

guaira.mel...@gmail.com

unread,
Jul 5, 2018, 1:10:57 PM7/5/18
to Nextflow
It is not even working with the example from the documentation

#!/usr/bin/env nextflow


process foo 
{


    publishDir 
'/home/tain/Documents/RNA_seq/QC/'


    output
:
    file 
'chunk_*' into letters


    
'''
    printf '
Hola' | split -b 1 - chunk_
    '''

}

However there is nothing in the path: 
'/home/tain/Documents/RNA_seq/QC/'

And the output is in the work dir:
work/14/8be1c980a52d25a0564de0239eda95$ ls
chunk_aa  chunk_ab  chunk_ac  chunk_ad

Why is this happening?

Paolo Di Tommaso

unread,
Jul 6, 2018, 3:20:41 AM7/6/18
to nextflow
Please include the `.nextflow.log` file produced by the execution. 

p

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

guaira.mel...@gmail.com

unread,
Jul 9, 2018, 11:08:21 AM7/9/18
to Nextflow
Attached you will find the -nextflow.log file resulting from running:

#!/usr/bin/env nextflow


params.fastq_files = '/home/tain/Documents/RNA_seq/Data/*.fastq'
params.outdir = '/home/tain/Documents/RNA_seq/QC/'


Channel
 
.fromPath( params.fastq_files )
 
.ifEmpty { error "Cannot find any reads matching: ${params.fastq_files}" }
 
.into { files_QC_ch }


process fastp
{

  publishDir pattern
: "*.html",
                      path
: { params.outdir + "fastp/" }, mode: 'copy'



  input
:
  file
(fastq_file) from files_QC_ch


  output
:
 
set file('fastp_*'), file('*.html') into fastp_results_ch


 
"""
  fastp -w 15 -i ${fastq_file} -o fastp_${fastq_file} -h ${fastq_file}_fastp.html
  """

}
nextflow run fastq_test
.nf


nextflow.log

Paolo Di Tommaso

unread,
Jul 9, 2018, 11:14:12 AM7/9/18
to nextflow
You are using a very old NF version (0.14.3), please update to the latest one (0.30.2).

If the problem persists please send the log create running nextflow using this command line 


nextflow -trace nextflow run fastq_test.nf




p

guaira.mel...@gmail.com

unread,
Jul 9, 2018, 1:01:01 PM7/9/18
to Nextflow
After updating Nextflow, everything goes smoothly. Next time I will check that firstly.
Thank you very much.

Sam Tischfield

unread,
Jul 9, 2018, 1:09:53 PM7/9/18
to Nextflow
I have a question related to this - I have set   publishDir "s3://mybucket/output/", mode: 'copy', overwrite: true in the start of my process. 

the process script correctly produces the output file and publishes it to the S3 bucket - however it also keeps it in the S3 work directory that was created (e.g. s3://mybucket/output/q0/) . 

The output is a fairly large file and I'd like to only have it in the publishDir output.  I tried setting mode to "move" but this just threw an error.

Paolo Di Tommaso

unread,
Jul 10, 2018, 3:27:21 AM7/10/18
to nextflow
Please open an issue reporting the complete error stack trace in the GitHub project

However I think the simplest way to manage this is to keep scratch data in a separate bucket that is cleaned up by an automatic S3 lifecycle rule. 


Hope it help. 


p

Reply all
Reply to author
Forward
0 new messages