S3 permissions error with Nextflow and AWS Batch

1,621 views
Skip to first unread message

stephen mclaughlin

unread,
Jan 31, 2018, 4:19:22 PM1/31/18
to Nextflow
Hi there!

   I have been trying to the RNASeq workflow to work that was in your blog:


  As I was interested in reproducing it to test Nextflow with AWS Batch.  I was having trouble with an S3 403 error, so I decided to write something much more simple for testing purposes which I am sharing here:


  I edited out the AWS credentials, but this is exactly what I ran other than that.  When I run this, here is how it looks on the terminal:

nextflow run aws_batch_test.nf --fastq_input C097F_N_111207.2.AGTTGCTT_R2_xxx.fastq.gz --outdir s3://bioinformatics-analysis/seqtk_test/output -profile awsbatch -w s3://bioinformatics-analysis/seqtk_test/work/
N E X T F L O W  ~  version 0.27.2
Launching `aws_batch_test.nf` [jovial_gates] - revision: d09a07722f
[warm up] executor > awsbatch
[40/61bc3b] Submitted process > seqtk
ERROR ~ Error executing process > 'seqtk'

Caused by:
  Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 142494674AE68741; S3 Extended Request ID: gbFUYCP0fOoFOnc8NNg0xYRswaKJjNT+RUb9bGQykmiHkq5oI+meQYzkPE/j5WTF3Kb7e3rPUf0=)


 -- Check '.nextflow.log' file for details

  I also tried with Nextflow 27.3 and the error message is the same.  The funny thing is that the workflow actually appears to have successfully run:

stephen@blahblah:~/software/mclaugsf/aws-batch-test$ s3cmd ls s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/
                       DIR   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/tmp/
2018-01-31 20:18         0   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/
2018-01-31 20:18         6   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.begin
2018-01-31 20:19         0   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.err
2018-01-31 20:18        49   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.log
2018-01-31 20:19         0   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.out
2018-01-31 20:18      3197   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.run
2018-01-31 20:18        89   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.command.sh
2018-01-31 20:18         1   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/.exitcode
2018-01-31 20:18      2796   s3://bioinformatics-analysis/seqtk_test/work/40/61bc3ba659ece4d98f991b70e125bc/output.fastq

  The S3 403 error appears to be occurring after the output file, output.fastq, was successfully created.  

Any tips appreciated!

Thank you,
Stephen


Paolo Di Tommaso

unread,
Feb 1, 2018, 4:06:58 AM2/1/18
to nextflow
I guess it's a problem with AWS IAM permissions that's quite to configure. 

Make sure that the bucket it's in the same region as where aws batch is running. Also check that Instance Role configured in the AWS Batch environment has S3 read-write full access permission. 


Hope it helps. 



--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

stephen mclaughlin

unread,
Feb 1, 2018, 10:47:00 AM2/1/18
to Nextflow
Thanks for your reply, Paolo.

We are scratching our heads over here.  I think everything is configured correctly.  The instance I'm running it from appears to have full access to the S3 bucket (I can read and write and so can nextflow it appears).  I logged into the EC2 instance that got kicked off successfully by AWS Batch while it was running, attached to the running container, and was able to successfully read and write to the S3 bucket from the running container.  

The bucket is in the same region where AWS Batch is running.  It even runs successfully and writes the correct output to the S3 bucket.  There is some event that happens at the very end of a job after everything worked perfectly well that is throwing this S3 403 error.  A single process works perfectly well, but it's the transition to the next process where the failure occurs.   If you think of anything else please let me know.

Thanks a lot!
Stephen
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Feb 1, 2018, 11:02:16 AM2/1/18
to nextflow
Could you include the full error stack trace that's in the .nextflow.log file ? 


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

stephen mclaughlin

unread,
Feb 1, 2018, 11:45:08 AM2/1/18
to Nextflow
Hi Paolo,
   I have attached the logfile.  Thanks!
Stephen
nextflow.log

Paolo Di Tommaso

unread,
Feb 1, 2018, 11:52:08 AM2/1/18
to nextflow
This is running in your computer, right? how have you specified the AWS security credentials? in the nextflow config or the AWS configuration file/env?

Are you able to use `aws s3 ls ... etc` instead of `s3cmd` command ? 



To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

stephen mclaughlin

unread,
Feb 1, 2018, 11:55:44 AM2/1/18
to Nextflow
I'm running it from another Amazon instance, actually.  

aws s3 ls works fine.  Here's that directory:

stephen@xxx:~/software/mclaugsf/aws-batch-test$ aws s3 ls s3://bioinformatics-analysis/seqtk_test/work/60/5c0d1d98e47b333daa7d8d3f4df9f3/
                           PRE tmp/
2018-02-01 11:23:15          0
2018-02-01 11:26:05          6 .command.begin
2018-02-01 11:26:17          0 .command.err
2018-02-01 11:26:08         49 .command.log
2018-02-01 11:26:17          0 .command.out
2018-02-01 11:23:16       3197 .command.run
2018-02-01 11:23:16         89 .command.sh
2018-02-01 11:26:08          1 .exitcode
2018-02-01 11:26:07       2796 output.fastq

Paolo Di Tommaso

unread,
Feb 1, 2018, 11:59:00 AM2/1/18
to nextflow
How AWS credentials are configured in this instance  ?


p

To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.

stephen mclaughlin

unread,
Feb 1, 2018, 12:39:01 PM2/1/18
to Nextflow
Hi Paolo,
   We got it to work and it turns out it was an S3 permissions issue.  I'll describe in more detail in case it helps someone else.  

This is what our instance policy was initially set to:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::bioinformatics-analysis"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::bioinformatics-analysis/*"
            ]
        }
    ]
}

  There was also a user policy (for the instance I was running it on) that was set the same way as this.  What we ended up doing was creeating a new policy with the AWS visual editor that looked like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:HeadBucket",
                "s3:ListObjects"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::bioinformatics-analysis",
                "arn:aws:s3:::bioinformatics-analysis/*"
            ]
        }
    ]
}

We changed that for the instance policy and ran it, but that didn't fix the issue.  It wasn't until we set the user policy that matched the one in the nextflow.config file to that same S3 policy that the issue was fixed. 

Thank you very much for your time and assistance!  I sincerely appreciate it.

Thanks,
Stephen

Paolo Di Tommaso

unread,
Feb 1, 2018, 12:44:07 PM2/1/18
to nextflow
Great!

Reply all
Reply to author
Forward
0 new messages