Writing results on S3 bucket fails

545 views
Skip to first unread message

Lavi Bharath

unread,
Nov 17, 2017, 1:54:04 AM11/17/17
to Nextflow
Hi,
I have just started exploring nextflow with containers and trying to read and write results into my S3 bucket.

..
#!/usr/bin/env nextflow

params.reads = "s3://test/DHB162-GCCAAT_S4_L001_R1_001.fastq.gz"
input_files = Channel.fromPath( params.reads )
process fastqc  {
        input:
        file reads from input_files
        output:
        file 'DHB162-GCCAAT_S4_L001_R1_001_fastqc.{zip,html}'

        script:
        """
        fastqc -q $reads
        """
}
aws credentials are in a config file. 
./nextflow fastqc.nf -c nextflow.config -with-docker 7ef78900b5dc -w s3://nextflowtest/out/
some tmp folders are created with .command.run, .command.sh but no results are being generated on S3

nextflow log as below:

Enter code here...Nov-17 06:44:59.580 [main] DEBUG nextflow.cli.Launcher - $> ./nextflow fastqc.nf -c nextflow.config -with-docker 7ef78900b5dc -w s3://nextflowtest/out/
Nov-17 06:44:59.761 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.26.0
Nov-17 06:44:59.771 [main] INFO  nextflow.cli.CmdRun - Launching `fastqc.nf` [distraught_spence] - revision: 4c82f492ac
Nov-17 06:44:59.787 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: nextflow.config
Nov-17 06:44:59.789 [main] DEBUG nextflow.config.ConfigBuilder - User config file: nextflow.config
Nov-17 06:44:59.792 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/ec2-user/nextflow.config
Nov-17 06:44:59.792 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/ec2-user/nextflow.config
Nov-17 06:45:00.117 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
Nov-17 06:45:00.223 [main] DEBUG nextflow.config.ConfigBuilder - Enabling execution in Docker container as requested by cli option `-with-docker 7ef78900b5dc`
Nov-17 06:45:00.248 [main] DEBUG nextflow.Session - Session uuid: 6c651745-5355-4a22-987c-d2f25a096e9d
Nov-17 06:45:00.248 [main] DEBUG nextflow.Session - Run name: distraught_spence
Nov-17 06:45:00.250 [main] DEBUG nextflow.Session - Executor pool size: 2
Nov-17 06:45:00.300 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 0.26.0 build 4715
  Modified: 07-11-2017 13:10 UTC 
  System: Linux 4.9.58-18.55.amzn1.x86_64
  Runtime: Groovy 2.4.11 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15
  Encoding: UTF-8 (ISO-8859-1)
  Process: 10571@localhost null
  CPUs: 1 - Mem: 993.4 MB (279.9 MB) - Swap: 0 (0)
Nov-17 06:45:00.328 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Nov-17 06:45:00.335 [main] DEBUG nextflow.Global - Using AWS credentials defined in nextflow config file
Nov-17 06:45:00.335 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=Rjh+Eu.., region=ap-southeast-1, access_key=AKIAIZ..}
Nov-17 06:45:02.206 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): /nextflowtest/out

Nov-17 06:45:02.206 [main] DEBUG nextflow.Session - Work-dir: /nextflowtest/out [null]
Nov-17 06:45:02.206 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/ec2-user/bin
Nov-17 06:45:02.280 [main] DEBUG nextflow.Session - Session start invoked
Nov-17 06:45:02.283 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Nov-17 06:45:02.284 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Nov-17 06:45:02.414 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-17 06:45:02.550 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: null
Nov-17 06:45:02.550 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Nov-17 06:45:02.559 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Nov-17 06:45:02.561 [main] INFO  nextflow.executor.Executor - [warm up] executor > local
Nov-17 06:45:02.566 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=1; memory=993.4 MB; capacity=1; pollInterval=100ms; dumpInterval=5m
Nov-17 06:45:02.569 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: LocalPollingMonitor
Nov-17 06:45:02.569 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
Nov-17 06:45:02.573 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: local
Nov-17 06:45:02.603 [main] DEBUG nextflow.Session - >>> barrier register (process: fastqc)
Nov-17 06:45:02.607 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fastqc -- maxForks: 2
Nov-17 06:45:02.641 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
Nov-17 06:45:02.641 [main] DEBUG nextflow.Session - Session await
Nov-17 06:45:06.415 [Task submitter] INFO  nextflow.Session - [a2/4d95e7] Submitted process > fastqc (1)
Nov-17 06:50:02.736 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- pending tasks are shown below
~> TaskHandler[id: 1; name: fastqc (1); status: SUBMITTED; exit: -; error: -; workDir: /nextflowtest/out/a2/4d95e70fa64c37a90d112e9a1571c3]


It just hangs as above.

Then I tried as below (i.e) to write the output back to S3
The following commandline works fine and but results are not being uoloaded to S3 bucket.
./nextflow fastqc.nf -c nextflow.config -with-docker 7ef78900b5dc --output s3://nextflowtest/out/

Anything that I am missing here? Thanks.

Regards

Paolo Di Tommaso

unread,
Nov 17, 2017, 2:32:41 AM11/17/17
to nextflow
Hi, 

Using S3 as working directory is only allowed by using a cloud enabled executor (see here for details). 

This does not mean however you cannot have your output in S3. You can do that by using a publishDir in your pipeline (or config file) that will copy one or more process outputs to a S3 path.


Hope it helps

Cheers,
Paolo


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Lavi Bharath

unread,
Nov 17, 2017, 2:55:31 AM11/17/17
to Nextflow
Thanks Paolo.

Lavi Bharath

unread,
Nov 22, 2017, 12:53:10 AM11/22/17
to Nextflow
Hi,
I have 3 steps workflow as follows:
1. bwa index
2. bwa mem
3. variant calling

#!/usr/bin/env nextflow
params.reads = "s3://test/reads/ecoli/SRR292*_{1,2}.fastq.gz"

params.genome = "s3://test/ecoli_nc_000913/ref/Ecoli_K12_MG1655_NC_000913.fa"
genome_file
= file(params.genome)
Channel
   
.fromFilePairs( params.reads )                                            
   
.ifEmpty { error "Cannot find any reads matching: ${params.reads}" }  
   
.set { read_pairs }
process create_index
{


    input
:
    file
(genome_file)


    output
:
    file
'Ecoli*.fa*' into bwa_index
   
    script
:
   
"""
    bwa index  ${genome_file}
    samtools faidx ${genome_file}
    """

}


process bwa_mem
{
 memory
'2 GB'
        input
:
       
set pair_id, file(reads) from read_pairs
        file
'*' from bwa_index


        output
:
       
set pair_id, file('output.bam') into bam_files
 
        script
:
       
"""
        bwa mem -R '@RG\tID:test\tSM:test' -t 8 Ecoli_K12_MG1655_NC_000913.fa ${reads} | samtools sort -T test -O BAM -o output.bam -
        samtools index output.bam
        """

}


process lofreq
{
 memory
'2 GB'
        input
:
       
set pair_id, bam_file from bam_files
        file
'*' from bwa_index
        file
(genome_file)


        output
:
       
set pair_id, file('output.vcf') into varinats


        script
:
       
"""
        lofreq call-parallel --pp-threads 4 -f Ecoli_K12_MG1655_NC_000913.fa  -o output.vcf ${bam_file}
        """



}


Step 1 and step 2 gets completed successfully and results are being written in S3 bucket. 
For the variant calling steps, commad.sh fails

!/bin/bash -ue
lofreq call-parallel --pp-threads 2 -f Ecoli_K12_MG1655_NC_000913.fa -o output.vcf    /test/75/bba7a1e698b5e9fa09256647f9807e/output.bam

The path of the output.bam is not right. S3:// part is missing. Script works fine when run locally but with AWS batch with "-w  " option, commandline fails as below 

"CRITICAL [2017-11-22 04:14:07,157]: Couldn't determine BAM file from argument list or file doesn't exist"

Thanks for you time and help.

Regards



On Friday, November 17, 2017 at 2:54:04 PM UTC+8, Lavi Bharath wrote:

Paolo Di Tommaso

unread,
Nov 22, 2017, 1:47:02 AM11/22/17
to nextflow
Could you please include the NF output and the `.nextflow.log` file ? 

p

Lavi Bharath

unread,
Nov 22, 2017, 1:56:42 AM11/22/17
to Nextflow
Nextflow output:
nxf-scratch-dir ip-172-31-3-150:/tmp/nxf.f1p1JF8IHQ
CRITICAL [2017-11-22 04:14:07,157]: Couldn't determine BAM file from argument list or file doesn't exist

The user-provided path output.vcf does not exist.


.command.err
CRITICAL [2017-11-22 04:14:07,157]: Couldn't determine BAM file from argument list or file doesn't exist

Commandline:
./nextflow run lofreq.nf -with-docker andreaswilm/lacer-lofreq:0.2 -w s3://test/

Please let me know if you need any further details. Thanks.

On Friday, November 17, 2017 at 2:54:04 PM UTC+8, Lavi Bharath wrote:

Lavi Bharath

unread,
Nov 22, 2017, 2:04:59 AM11/22/17
to Nextflow

Paolo Di Tommaso

unread,
Nov 22, 2017, 2:22:44 AM11/22/17
to nextflow
There's a problem in the `lofreq` process input declaration.



I guess the following 

      input:
        set pair_id, bam_file from bam_files 

should be: 

      input:
        set pair_id, file(bam_file) from bam_files


Otherwise that file is not handled as such by NF.  

Hope it helps. 

p

Lavi Bharath

unread,
Nov 22, 2017, 4:11:32 AM11/22/17
to Nextflow
Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages