About cellranger_workflow

43 views
Skip to first unread message

Sebastián 'Nacho' Zúñiga Norman

unread,
Jun 11, 2021, 7:05:16 AM6/11/21
to cumulus...@googlegroups.com

Dear Cumulus support team, 


I am Sebastián Zúñiga, a research member at the Bioinformatics Medical Centre of the University of Turku, Finland, a true pleasure to contact you.


I have been entrusted with the building of a Cell Ranger pipeline and by looking at the state of the art I have come across your approach and it definitely caught the attention. I took the liberty to write to you and ask a question that might help me in the endeavor. 


I see that the workflow allows to indicate whether to run mkfastq and count or not, that’s great!, but as an input you take the samplesheet.csv. The information contained here is what I cannot automatize for a pipeline (given the different nature of the sequenced samples) and my question is, the csv read is part of your design or does 10X Genomics provide such an option to incorporate a csv as template to call these two commands sequentially? I have not seen this as part of 10X’s documentation, but it’s a great approach. 


If there are any pointers you can give me in this direction are greatly appreciated.

Thank you so much for your time.


Best regards from Finland.


Kiitos
Ystävällisin Terveisin

_________________________________
Sebastián 'Nacho' Zúñiga Norman
Project Researcher
Bioinformatics Medical Center

Yiming Yang

unread,
Jun 11, 2021, 2:19:07 PM6/11/21
to Sebastián 'Nacho' Zúñiga Norman, Cumulus Support
Hello Sebastián,

Glad to hear from you, and thank you for your interest on Cumulus!

For our cellranger_workflow WDL, we indeed handle the input via csv format sample sheets, and it's not based on 10X's API. You can read the description on how to prepare sample sheets for cellranger_workflow in more details here: https://cumulus.readthedocs.io/en/stable/cellranger/index.html#prepare-a-sample-sheet.

In brief, cellranger_workflow is a top-level WDL which triggers the corresponding workflows mainly based on the "DataType" column (though it's an optional column, its default is "rna"), which covers different omics approaches in a hopefully user-transparent way. Then for cellranger mkfastq and count steps, the workflow automatically generates the corresponding 10X sample sheets in csv format for them, respectively, but they are different from the sample sheet that user provides for cellranger_workflow WDL.

Hope it helps. And feel free to reach out to us with any question.


Sincerely,
Yiming

--
You received this message because you are subscribed to the Google Groups "Cumulus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cumulus-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cumulus-support/CAGhgKgzs0-%3DJ3DiNvWbjTRTYjtWmUMdar-Ho_PwYHYimq5T1sw%40mail.gmail.com.

Yiming Yang

unread,
Jul 30, 2021, 5:45:02 PM7/30/21
to Sebastian Zuniga Norman, Cumulus Support
Dear Sebastian,

I'm trying to understand your situation. For running cellranger_workflow WDL starting from BCL files (i.e. with "run_mkfastq" being true), the workflow uses docker image "gcr.io/broad-cumulus/cellranger:<version>" which is hosted on Broad Institute's Google Cloud registry and only open to Broad users. This is because of the license restriction of bcl2fastq software from Illumina company.

We have a helper webpage on creating your own cellranger docker image containing bcl2fastq at https://cumulus.readthedocs.io/en/stable/bcl2fastq.html. Not sure if this is what you are looking for.

We don't have experience on the deployment of WDLs on SLURM server. For issues related to this topic, you may want to contact Cromwell team. Besides, there is also a Python package called Caper (https://github.com/ENCODE-DCC/caper) developed by a team at Stanford, which wraps Cromwell for different computing environments (including SLURM), and provides a server-client mode for usage. You may want to check it out for your use case.


Sincerely,
Yiming

On Thu, Jul 29, 2021 at 6:17 AM Sebastian Zuniga Norman <siz...@utu.fi> wrote:

Dear Yiming,

 

After some weeks of tests, I have managed to load the Cromwell service using our workload manager’s configuration (SLURM). Yet, I need to first develop a simplified automated way to call the cellranger_workflow (starting with mkfastq) when new BCL files are found. So, I’m writing to you to ask if there is any possibility of getting in contact with the development team?, I see in the GitHub page that the environment created on which your users access the service is totally different from us and I would appreciate any indicators on what could be the best approach to build ours.

 

Thank you so much for your diligence and time,

 

Best regards from Finland,

 

From: Yiming Yang <yy...@broadinstitute.org>
Sent: keskiviikko 16. kesäkuuta 2021 22.46
To: Sebastian Zuniga Norman <siz...@utu.fi>
Subject: Re: About cellranger_workflow

 

Hi Sebastian,

 

Yes, you are right. Cromwell should be the tool to look into for your situation. You may also need docker and java on your server to run Cromwell. Actually we test our WDLs on Ubuntu Linux server with Cromwell. The major difference is that there is no way to specify the memory or disk space to use when running on a server directly through Cromwell, as those options are only for cloud.

 

WDL is open-source (https://github.com/openwdl/wdl) and has no restriction on execution.

 

 

Sincerely,

Yiming

 

On Wed, Jun 16, 2021 at 5:23 AM Sebastian Zuniga Norman <siz...@utu.fi> wrote:

Hi there again,

 

I think I’ve answered my question: Cromwell. That’s the approach. Will look into it if it would be possible to include it in our cluster. This is some exciting news.

 

BR,

Reply all
Reply to author
Forward
0 new messages