how to pass aws s3 path as an absolute path in nexflow

1,471 views
Skip to first unread message

Tanashri Jaganade

unread,
Jun 30, 2021, 3:02:58 AM6/30/21
to Nextflow

Hi everyone,

I am trying to run a feature barcode analysis in nextflow.

The cellranger count requires absolute paths to be passed in the CSV file. .

I need a help in understating how I can pass aws s3 paths as an absolute paths in --libraries options inside library.csv file 


For the reference the library file structure is as follows:

fastqs,sample,library_type
/opt/foo/
,GEX_sample1.Gene Expression

/opt/foo/
,CRISPR_sample1,CRISPR Guide Capture


I want to replace /opt/foo with s3 path:

s3://bucket-name/path: this does not work as the path should start with / and it should be an absolute path.


Thanks,

Tanashree


Paolo Di Tommaso

unread,
Jul 2, 2021, 3:40:00 AM7/2/21
to nextflow
Can you include the process definition for the cellranger task? 

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/ec103e30-ea35-4668-a7fb-c449ac0d47bdn%40googlegroups.com.

tanashree jaganade

unread,
Jul 2, 2021, 4:27:56 AM7/2/21
to Nextflow
params.index='s3://bucket-name/genome/index/path'
params.library='s3://bucket-name/library/file/path/library.csv'
params.featureref='s3://bucket-name/featureref/file/path/feature-ref.csv'

process cellranger_count {
        
        publishDir "$params.outdir/count" , mode : 'copy' , overwrite : false

        input:
        path genome from params.index
        path libraryfile from params.library
        path featurefile from params.featureref
      
        output:

        file '*'
        file '*' into seurat_ch

        """
        echo ${libraryfile}
        /docker_main/cellranger-5.0.1/cellranger count \
                --id=${params.id} \
                --transcriptome=${genome} \
                --libraries=${libraryfile} \
                --feature-ref=${featurefile} \
                --expect-cells=1000

        """
}


The structure of the library file (library.csv)
fastqs,sample,library_type
s3://fastqs/path1,s1,antibody
s3://fastqs/path2,s2,Gene expression

The problem in the script when I try t o run a AWS-Batch jobs:

The count process is unable to read the s3:// paths as it takes absolute paths starting with '/'  in the column fastqs inside the library.csv file.


Cellranger reference link:

Thanks,
Tanashree

tanashree jaganade

unread,
Jul 2, 2021, 4:35:36 AM7/2/21
to Nextflow
In continuation with this, I would like to mention that
The files are taken directly from the .csv file, cell ranger doesn't read the paths through channels. 

I  have tried to create a directory with all the fastq files in nextflow, and gave that path in csv, but since it does not take absolute path, cell ranger is not able to read it.

The problem here is cellranger takes a csv file as an argument that has the FASTQ paths inside it. I want some way for nextlfow to read those paths.


I also tried FsX, it also did not work.


Please suggest options if any.


Thanks,
Tanashree

Paolo Di Tommaso

unread,
Jul 2, 2021, 4:43:03 AM7/2/21
to nextflow
> The files are taken directly from the .csv file, cell ranger doesn't read the paths through channels. 

This is the problem, files need to be staged in the task work directory and the csv file created by the task itself to contain such file paths (provided there's no there way to the directory paths directly to cellranger) 

p

tanashree jaganade

unread,
Jul 2, 2021, 5:32:47 AM7/2/21
to Nextflow
Hi,

Thnaks, for your prompt response.


so the way I tried to stage the directories is to pass the fastq paths via nextflow Channels as follows and use the variable in the CSV files. 

params.fastq_dir1='s3://bucket-name/fastq1/path'
params.fatsq_dir2='s3://bucket-name/fastq2/path'
params.index='s3://bucket-name/genome/index/path'
params.library='s3://bucket-name/library/file/path/library.csv'
params.featureref='s3://bucket-name/featureref/file/path/feature-ref.csv'

reads1_ch=Channel.fromPath(params.fastq_dir1)
reads2_ch= Channel.fromPath(params.fastq_dir2)

process cellranger_count {
        
        publishDir "$params.outdir/count" , mode : 'copy' , overwrite : false

        input:
        path genome from params.index
        file 'read1' from reads1_ch.collect()
        file 'read2' from reads2_ch.collect()

        path libraryfile from params.library
        path featurefile from params.featureref
     
        output:

        file '*'
        file '*' into seurat_ch

        """
        echo ${libraryfile}
        /docker_main/cellranger-5.0.1/cellranger count \
                --id=${params.id} \
                --transcriptome=${genome} \
                --libraries=${libraryfile} \
                --feature-ref=${featurefile} \
                --expect-cells=1000

        """
}


and then passed reads1, reads2 variables in the csv file but since this is not an absolute path it could not read the input fastq files path.

library.csv now is
fastqs, sample,library_type
reads1/,s1,antibody
reads2/,s2,gene expression.

Problem: since these are not absolute paths cell ranger again failed to read even via channels.

I would like to know
1. if is there any way to read the files through channels, and create a CSV and provide a directory to cell ranger?
2. if there is any way we could make it an absolute path
3. if we could use the path to scratch directory

Thanks,
Tanashree

Jennifer Modliszewski

unread,
Feb 16, 2022, 11:30:01 AM2/16/22
to Nextflow
Hello,

I'm wondering if either of you solved the problem of passing in the paths to cellranger?  I'm running into the same issue on a google storage bucket.  

It seems that the first option mentioned above, that is,  reading the files through channels would work the best.

Thanks!
Jen 

tanashree jaganade

unread,
Feb 17, 2022, 7:29:59 AM2/17/22
to next...@googlegroups.com
Hi Jennifer,
I have solved this issue. I copied all the files from s3 to local Nextflow path and used that path via channel as input to the following script.

Let me know I will send my script if required. I did this is bash.

Thanks,
Tanashree 
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nextflow/48c6dc6f-9812-4d32-a020-7cba87528429n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages