Estimated run time for ~80K cells

459 views
Skip to first unread message

Jacqueline Chou

unread,
Feb 4, 2023, 8:40:29 PM2/4/23
to Trinity_CTAT_users
Hi,

I'm running inferCNV on my Macbook Pro (64 GB) on 80K cells. It's currently on 'Step 8: removing average of reference data (before smoothing)'.

How long would it take to run the default infercnv::run function on my dataset? Happy to provide other needed info.

Thank you.

Brian Haas

unread,
Feb 5, 2023, 8:31:41 AM2/5/23
to Jacqueline Chou, Christophe Georgescu, Trinity_CTAT_users

--
You received this message because you are subscribed to the Google Groups "Trinity_CTAT_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinity_ctat_us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinity_ctat_users/368496ed-f24b-49f4-973c-9d946e8a2ce6n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Christophe Georgescu

unread,
Feb 6, 2023, 11:08:05 AM2/6/23
to Brian Haas, Jacqueline Chou, Trinity_CTAT_users
Hi Jacqueline,

I do not have a precise estimate of how long such a run would take as I use smaller datasets but a full run on a dataset of similar size did take a few days (less than a week) to run when I did a while back. There are however more variables than just the number of cells that affect run time, such as number of detected genes and size of annotation groups. The step you are on does take more time that average steps, but the 2 steps that take the most time are subclustering if done with the random trees method (Leiden subclustering is much much faster, so if you are using a recent version of infercnv, this is the default) and running the Bayesian mixture model (step 18).

Regards,
Christophe.

Jacqueline Chou

unread,
Feb 6, 2023, 4:21:20 PM2/6/23
to Christophe Georgescu, Brian Haas, Trinity_CTAT_users
Thanks so much Christophe for the info. It finished running, but I'm now running into an issue with visualizing the heatmap. Is there a way I can visualize all ~80K cells or even a subset of these cells?

Thanks in advance for the guidance.


Christophe Georgescu

unread,
Feb 6, 2023, 4:34:49 PM2/6/23
to Jacqueline Chou, Brian Haas, Trinity_CTAT_users
Hi Jacqueline,

What issue are you running into?

The plot_cnv() method allows you to manually plot the figure that you would usually get from infercnv and has a number of configuration options. Since you have a high number of cells, I would suggest using the dynamic_resize option so that the figure has more height and so cells are less compressed.
Another option is to use the plot_per_group() method which allows each annotation group (reference and/or observations) to be plotted on a separate figure where only the bottom heatmap space is used. This method also has options to sample cells if there are more than a given number, and control what frequency to sample at. You can also decide to save the internally generated infercnv objects for each annotation group to use them with plot_cnv() if you want to use specific settings.
Alternatively, you can use the sample_object() method to downsample your results and then plot them with plot_cnv().

Regards,
Christophe.

Reply all
Reply to author
Forward
0 new messages